oU 


I] SiIsAjeuy 


Analysis Il 


Terence Tao 


rion 


HINDUSTAN 
BOOK AGENCY 


TEXTS AND READINGS 38 
IN MATHEMATICS 


Analysis Il 


Texts and Readings in Mathematics 


Advisory Editor 
C. S. Seshadri, Chennai Mathematical Inst., Chennai. 


Managing Editor 
Rajendra Bhatia, Indian Statistical Inst., New Delhi. 


Editors 

V. S. Borkar, Tata Inst. of Fundamental Research, Mumbai. 
Probal Chaudhuri, Indian Statistical Inst., Kolkata. 

R. L. Karandikar, Indian Statistical Inst., New Delhi. 

M. Ram Murty, Queen's University, Kingston. 

V.S. Sunder, Inst. of Mathematical Sciences, Chennai. 

M. Vanninathan, TIFR Centre, Bangalore. 

T.N. Venkataramana, Tata Inst. of Fundamental Research, Mumbai. 


Analysis Il 


Terence Tao 
University of California 


Los Angeles 


ET HINDUSTAN 
BOOK AGENCY 


Published by 


Hindustan Book Agency (India) 
P 19 Green Park Extension 

New Delhi 110 016 

India 


email: hba@vsnl.com 
http://www.hindbook.com 


Copyright © 2006 by Hindustan Book Agency (India) 


No part of the material protected by this copyright notice may be 
reproduced or ‘utilized in any form or by any means, electronic or 
mechanical, including photocopying, recording or by any information 
storage and retrieval system, without written permission from the 
copyright owner, who has also the sole right to grant licences for 
translation into other languages and publication thereof. 


All export rights for this edition vest exclusively with Hindustan Book 
Agency (India). Unauthorized export is a violation of Copyright Law 
and is subject to legal action. 


Produced from camera ready copy supplied by the Author. 


ISBN 81-85931-63-1 


To my parents, for everything 


Contents 


Volume 1 


Preface 


1 


Introduction 


The natural numbers 
2.1 The Peano axioms ................6. 
2.2 Addition............ Be as ce ede eee eet 


Set theory 

3.1 Fundamentals.................2.006. 
3.2 Russell’s paradox (Optional) ............ 
3.9 FUNCTIONS cora Gop eB a eds Sepa os Be ea ie cw 
3.4 Images and inverse images.............. 
3.5 Cartesian products ............0.2.2. 0004 


3.6 Cardinality of sets ..............0.00. 


Integers and rationals 
AA. “he integers ju. «4a ak oe woe aw Oe we 
4.2 The rationals 2 3.4.60 i bs BS ew 


xiii 


vill 


4.4 Gaps inthe rational numbers ...... 


The real numbers 


5.1 Cauchy sequences............. 


5.2 Equivalent Cauchy sequences 
5.3 The construction of the real numbers 
5.4 Ordering the reals 
5.5 The least upper bound property 
5.6 Real exponentiation, part I 


Limits of sequences 
6.1 Convergence and limit laws 
6.2 The extended real number system 


6.3 Suprema and infima of sequences ... . 
6.4 Limsup, liminf, and limit points..... 
6.5 Some standard limits........... 


6.6 Subsequences 


6.7 Real exponentiation, part I] ....... 


Series 


7.1 Finite series ................ 


7.2 Infinite series 
7.3 Sums of non-negative numbers 
7.4 Rearrangement of series 


7.5 The root and ratio tests ......... 


Infinite sets 


8.1 Countability................ 
8.2 Summation on infinite sets. ....... 
8.3 Uncountable sets ............. 


8.4 The axiom of choice 


8.5 Ordered sets................ 


Continuous functions on R 


9.1 Subsets of the realline.......... 


9.2 The algebra of real-valued functions 


9.3 Limiting values of functions ....... 


CONTENTS 


CONTENTS 


9.4 Continuous functions................. 
9.5 Left and right limits ................. 
9.6 The maximum principle ............... 
9.7 The intermediate value theorem........... 
9.8 Monotonic functions ................. 
9.9 Uniform continuity ................4. 
9.10 Limits at infinity ................... 


10 Differentiation of functions 


10.1 Basic definitions ................02.8. 
10.2 Local maxima, local minima, and derivatives 
10.3 Monotone functions and derivatives. ........ 


10.4 Inverse functions and derivatives .......... 
10:5 Hopital s rülë s- cia 45 ee ee ees he 28 2 a Bs 


11 The Riemann integral 


11.1 Partitions .. 696-3 6 6 oo ae ee ERR eS 6 ROO H 


11.3 Upper and lower Riemann integrals. ........ 
11.4 Basic properties of the Riemann integral...... 
11.5 Riemann integrability of continuous functions . . . 
11.6 Riemann integrability of monotone functions 

11.7 A non-Riemann integrable function. ........ 
11.8 The Riemann-Stieltjes integral ........... 
11.9 The two fundamental theorems of calculus. .... 


11.10Consequences of the fundamental theorems 


Appendix: the basics of mathematical logic 

A.1 Mathematical statements .............. 
A.2 Implication ........... 2.2.00 eee een 
A.3 The structure of proofs. ............... 
A.4 Variables and quantifiers............... 
A.5 Nested quantifiers .................. 
A.6 Some examples of proofs and quantifiers ...... 
AT Egali << 7%.s. 4-6 6 oS Owe Rk oo ee Ss 


x CONTENTS 


B Appendix: the decimal system 
B.1 The decimal representation of natural numbers . . 
B.2 The decimal representation of real numbers .... 


Index 


Volume 2 


Preface 


12 Metric spaces 
12.1 Definitions and examples............... 
12.2 Some point-set topology of metric spaces. ..... 
12.3 Relative topology ..............-05008. 
12.4 Cauchy sequences and complete metric spaces. . . 
12.5 Compact metric spaces.............6.-. 


13 Continuous functions on metric spaces 
13.1 Continuous functions................. 
13.2 Continuity and product spaces ........... 
13.3 Continuity and compactness............. 
13.4 Continuity and connectedness ............ 
13.5 Topological spaces (Optional) ............ 


14 Uniform convergence 
14.1 Limiting values of functions ............. 
14.2 Pointwise and uniform convergence ......... 
14.3 Uniform convergence and continuity ........ 
14.4 The metric of uniform convergence ......... 
14.5 Series of functions; the Weierstrass M-test..... 
14.6 Uniform convergence and integration. ....... 
14.7 Uniform convergence and derivatives ........ 
14.8 Uniform approximation by polynomials ...... 


CONTENTS 


15 Power series 
15.1 Formal power series ................. 
15.2 Real analytic functions................ 
15.3 Abel’s theorem ...............2.22004 
15.4 Multiplication of power series ............ 
15.5 The exponential and logarithm functions. ..... 
15.6 A digression on complex numbers.......... 
15.7 Trigonometric functions ............... 


16 Fourier series 
16.1 Periodic functions .................. 
16.2 Inner products on periodic functions ........ 
16.3 Trigonometric polynomials.............. 
16.4 Periodic convolutions................. 


17 Several variable differential calculus 
17.1 Linear transformations. ............... 
17.2 Derivatives in several variable calculus ....... 
17.3 Partial and directional derivatives. ......... 
17.4 The several variable calculus chain rule....... 
17.5 Double derivatives and Clairaut’s theorem ..... 
17.6 The contraction mapping theorem ......... 
17.7 The inverse function theorem ............ 
17.8 The implicit function theorem............ 


18 Lebesgue measure 
18.1 The goal: Lebesgue measure. ............ 
18.2 First attempt: Outer measure............ 
18.3 Outer measure is not additive............ 


19 Lebesgue integration 
19.1 Simple functions .................24. 
19.2 Integration of non-negative measurable functions . 
19.3 Integration of absolutely integrable functions 


CONTENTS 


19.4 Comparison with the Riemann integral. ...... 622 
19.5 Fubini’s theorem .............-22000- 624 


Index 


Preface 


This text originated from the lecture notes I gave teaching the ho- 
nours undergraduate-level real analysis sequence at the University 
of California, Los Angeles, in 2003. Among the undergraduates 
here, real analysis was viewed as being one of the most difficult 
courses to learn, not only because of the abstract concepts being 
introduced for the first time (e.g., topology, limits, measurability, 
etc.), but also because of the level of rigour and proof demanded 
of the course. Because of this perception of difficulty, one was 
often faced with the difficult choice of either reducing the level 
of rigour in the course in order to make it easier, or to maintain 
strict standards and face the prospect of many undergraduates, 
even many of the bright and enthusiastic ones, struggling with 
the course material. 

Faced with this dilemma, I tried a somewhat unusual approach 
to the subject. Typically, an introductory sequence in real analy- 
sis assumes that the students are already familiar with the real 
numbers, with mathematical induction, with elementary calculus, 
and with the basics of set theory, and then quickly launches into 
the heart of the subject, for instance the concept of a limit. Nor- 
mally, students entering this sequence do indeed have a fair bit 
of exposure to these prerequisite topics, though in most cases the 
material is not covered in a thorough manner. For instance, very 
few students were able to actually define a real number, or even 
an integer, properly, even though they could visualize these num- 
bers intuitively and manipulate them algebraically. This seemed 


xiv Preface 


to me to be a missed opportunity. Real analysis is one of the first 
subjects (together with linear algebra and abstract algebra) that 
a student encounters, in which one truly has to grapple with the 
subtleties of a truly rigorous mathematical proof. As such, the 
course offered an excellent chance to go back to the foundations 
of mathematics, and in particular the opportunity to do a proper 
and thorough construction of the real numbers. 


Thus the course was structured as follows. In the first week, 
I described some well-known “paradoxes” in analysis, in which 
standard laws of the subject (e.g., interchange of limits and sums, 
or sums and integrals) were applied in a non-rigorous way to give 
nonsensical results such as 0 = 1. This motivated the need to go 
back to the very beginning of the subject, even to the very defin- 
ition of the natural numbers, and check all the foundations from 
scratch. For instance, one of the first homework assignments was 
to check (using only the Peano axioms) that addition was asso- 
ciative for natural numbers (i.e., that (a + b) +c =a + (b+ c) for 
all natural numbers a,b,c: see Exercise 2.2.1). Thus even in the 
first week, the students had to write rigorous proofs using math- 
ematical induction. After we had derived all the basic properties 
of the natural numbers, we then moved on to the integers (ini- 
tially defined as formal differences of natural numbers); once the 
students had verified all the basic properties of the integers, we 
moved on to the rationals (initially defined as formal quotients of 
integers); and then from there we moved on (via formal limits of 
Cauchy sequences) to the reals. Around the same time, we covered 
the basics of set theory, for instance demonstrating the uncount- 
ability of the reals. Only then (after about ten lectures) did we 
begin what one normally considers the heart of undergraduate real 
analysis - limits, continuity, differentiability, and so forth. 


The response to this format was quite interesting. In the first 
few weeks, the students found the material very easy on a con- 
ceptual level, as we were dealing only with the basic properties 
of the standard number systems. But on an intellectual level it 
was very challenging, as one was analyzing these number systems 
from a foundational viewpoint, in order to rigorously derive the 


Preface XV 


more advanced facts about these number systems from the more 
primitive ones. One student told me how difficult it was to ex- 
plain to his friends in the non-honours real analysis sequence (a) 
why he was still learning how to show why all rational numbers 
are either positive, negative, or zero (Exercise 4.2.4), while the 
non-honours sequence was already distinguishing absolutely con- 
vergent and conditionally convergent series, and (b) why, despite 
this, he thought his homework was significantly harder than that 
of his friends. Another student commented to me, quite wryly, 
that while she could obviously see why one could always divide 
a natural number n into a positive integer q to give a quotient 
a and a remainder r less than q (Exercise 2.3.5), she still had, 
to her frustration, much difficulty in writing down a proof of this 
fact. (I told her that later in the course she would have to prove 
statements for which it would not be as obvious to see that the 
statements were true; she did not seem to be particularly consoled 
by this.) Nevertheless, these students greatly enjoyed the home- 
work, as when they did persevere and obtain a rigorous proof of 
an intuitive fact, it solidifed the link in their minds between the 
abstract manipulations of formal mathematics and their informal 
intuition of mathematics (and of the real world), often in a very 
satisfying way. By the time they were assigned the task of giv- 
ing the infamous “epsilon and delta” proofs in real analysis, they 
had already had so much experience with formalizing intuition, 
and in discerning the subtleties of mathematical logic (such as the 
distinction between the “for all” quantifier and the “there exists” 
quantifier), that the transition to these proofs was fairly smooth, 
and we were able to cover material both thoroughly and rapidly. 
By the tenth week, we had caught up with the non-honours class, 
and the students were verifying the change of variables formula for 
Riemann-Stieltjes integrals, and showing that piecewise continu- 
ous functions were Riemann integrable. By the conclusion of the 
sequence in the twentieth week, we had covered (both in lecture 
and in homework) the convergence theory of Taylor and Fourier 
series, the inverse and implicit function theorem for continuously 
differentiable functions of several variables, and established the 


xvi Preface 


dominated convergence theorem for the Lebesgue integral. 


In order to cover this much material, many of the key foun- 
dational results were left to the student to prove as homework; 
indeed, this was an essential aspect of the course, as it ensured 
the students truly appreciated the concepts as they were being in- 
troduced. This format has been retained in this text; the majority 
of the exercises consist of proving lemmas, propositions and theo- 
rems in the main text. Indeed, I would strongly recommend that 
one do as many of these exercises as possible - and this includes 
those exercises proving “obvious” statements - if one wishes to use 
this text to learn real analysis; this is not a subject whose sub- 
tleties are easily appreciated just from passive reading. Most of 
the chapter sections have a number of exercises, which are listed 
at the end of the section. 


To the expert mathematician, the pace of this book may seem 
somewhat slow, especially in early chapters, as there is a heavy 
emphasis on rigour (except for those discussions explicitly marked 
“Informal” ), and justifying many steps that would ordinarily be 
quickly passed over as being self-evident. The first few chapters 
develop (in painful detail) many of the “obvious” properties of the 
standard number systems, for instance that the sum of two posi- 
tive real numbers is again positive (Exercise 5.4.1), or that given 
any two distinct real numbers, one can find rational number be- 
tween them (Exercise 5.4.5). In these foundational chapters, there 
is also an emphasis on non-circularity - not using later, more ad- 
vanced results to prove earlier, more primitive ones. In particular, 
the usual laws of algebra are not used until they are derived (and 
they have to be derived separately for the natural numbers, inte- 
gers, rationals, and reals). The reason for this is that it allows the 
students to learn the art of abstract reasoning, deducing true facts 
from a limited set of assumptions, in the friendly and intuitive set- 
ting of number systems; the payoff for this practice comes later, 
when one has to utilize the same type of reasoning techniques to 
grapple with more advanced concepts (e.g., the Lebesgue integral). 


The text here evolved from my lecture notes on the subject, 
and thus is very much oriented towards a pedagogical perspec- 


Preface xvii 


tive; much of the key material is contained inside exercises, and 
in many cases I have chosen to give a lengthy and tedious, but in- 
structive, proof instead of a slick abstract proof. In more advanced 
textbooks, the student will see shorter and more conceptually co- 
herent treatments of this material, and with more emphasis on 
intuition than on rigour; however, I feel it is important to know 
how to do analysis rigorously and “by hand” first, in order to truly 
appreciate the more modern, intuitive and abstract approach to 
analysis that one uses at the graduate level and beyond. 


The exposition in this book heavily emphasizes rigour and 
formalism; however this does not necessarily mean that lectures 
based on this book have to proceed the same way. Indeed, in my 
own teaching I have used the lecture time to present the intuition 
behind the concepts (drawing many informal pictures and giving 
examples), thus providing a complementary viewpoint to the for- 
mal presentation in the text. The exercises assigned as homework 
provide an essential bridge between the two, requiring the student 
to combine both intuition and formal understanding together in 
order to locate correct proofs for a problem. This I found to be 
the most difficult task for the students, as it requires the subject 
to be genuinely learnt, rather than merely memorized or vaguely 
absorbed. Nevertheless, the feedback I received from the students 
was that the homework, while very demanding for this reason, 
was also very rewarding, as it allowed them to connect the rather 
abstract manipulations of formal mathematics with their innate 
intuition on such basic concepts as numbers, sets, and functions. 
Of course, the aid of a good teaching assistant is invaluable in 
achieving this connection. 


With regard to examinations for a course based on this text, 
I would recommend either an open-book, open-notes examination 
with problems similar to the exercises given in the text (but per- 
haps shorter, with no unusual trickery involved), or else a take- 
home examination that involves problems comparable to the more 
intricate exercises in the text. The subject matter is too vast to 
force the students to memorize the definitions and theorems, so 
I would not recommend a closed-book examination, or an exami- 


xviii Preface 


nation based on regurgitating extracts from the book. (Indeed, in 
my own examinations I gave a supplemental sheet listing the key 
definitions and theorems which were relevant to the examination 
problems.) Making the examinations similar to the homework as- 
signed in the course will also help motivate the students to work 
through and understand their homework problems as thoroughly 
as possible (as opposed to, say, using flash cards or other such de- 
vices to memorize material), which is good preparation not only 
for examinations but for doing mathematics in general. 

Some of the material in this textbook is somewhat periph- 
eral to the main theme and may be omitted for reasons of time 
constraints. For instance, as set theory is not as fundamental 
to analysis as are the number systems, the chapters on set theory 
(Chapters 3, 8) can be covered more quickly and with substantially 
less rigour, or be given as reading assignments. The appendices 
on logic and the decimal system are intended as optional or sup- 
plemental reading and would probably not be covered in the main 
course lectures; the appendix on logic is particularly suitable for 
reading concurrently with the first few chapters. Also, Chapter 
16 (on Fourier series) is not needed elsewhere in the text and can 
be omitted. 

For reasons of length, this textbook has been split into two 
volumes. The first volume is slightly longer, but can be covered 
in about thirty lectures if the peripheral material is omitted or 
abridged. The second volume refers at times to the first, but can 
also be taught to students who have had a first course in analysis 
from other sources. It also takes about thirty lectures to cover. 

I am deeply indebted to my students, who over the progression 
of the real analysis course corrected several errors in the lectures 
notes from which this text is derived, and gave other valuable 
feedback. I am also very grateful to the many anonymous refer- 
ees who made several corrections and suggested many important 
improvements to the text. 

Terence Tao 


Chapter 12 


Metric spaces 


12.1 Definitions and examples 


In Definition 6.1.5 we defined what it meant for a sequence (£n) m 
of real numbers to converge to another real number gv; indeed, 
this meant that for every € > 0, there exists an N > m such 
that |£ — £n| < £ for all n > N. When this is the case, we write 
miasa fnr = F: 

Intuitively, when a sequence (£n) m converges to a limit z, 
this means that somehow the elements £n of that sequence will 
eventually be as close to x as one pleases. One way to phrase 
this more precisely is to introduce the distance function d(x,y) 
between two real numbers by d(x,y) := |x—y|. (Thus for instance 
d(3,5) = 2, d(5,3) = 2, and d(3,3) = 0.) Then we have 


Lemma 12.1.1. Let (£n), be a sequence of real numbers, and 
let x be another real number. Then (£n) -m converges to x if and 
only if liMmn—oo d(£n, £) = 0. 


Proof. See Exercise 12.1.1. oO 


One would now like to generalize this notion of convergence, 
so that one can take limits not just of sequences of real numbers, 
but also sequences of complex numbers, or sequences of vectors, or 
sequences of matrices, or sequences of functions, even sequences 
of sequences. One way to do this is to redefine the notion of con- 
vergence each time we deal with a new type of object. As you 


390 12. Metric spaces 


can guess, this will quickly get tedious. A more efficient way is 
to work abstractly, defining a very general class of spaces - which 
includes such standard spaces as the real numbers, complex num- 
bers, vectors, etc. - and define the notion of convergence on this 
entire class of spaces at once. (A space is just the set of all objects 
of a certain type - the space of all real numbers, the space of all 
3 x 3 matrices, etc. Mathematically, there is not much distinction 
between a space and a set, except that spaces tend to have much 
more structure than what a random set would have. For instance, 
the space of real numbers comes with operations such as addition 
and multiplication, while a general set would not.) 

It turns out that there are two very useful classes of spaces 
which do the job. The first class is that of metric spaces, which 
we will study here. There is a more general class of spaces, called 
topological spaces, which is also very important, but we will only 
deal with this generalization briefly, in Section 13.5. 

Roughly speaking, a metric space is any space X which has a 
concept of distance d(x,y) - and this distance should behave in a 
reasonable manner. More precisely, we have 


Definition 12.1.2 (Metric spaces). A metric space (X,d) is a 
space X of objects (called points), together with a distance func- 
tion or metric d : X x X — [0,+00), which associates to each 
pair x,y of points in X a non-negative real number d(x,y) > 0. 
Furthermore, the metric must satisfy the following four axioms: 


(a) For any x € X, we have d(x, x) = 0. 
(b) (Positivity) For any distinct x,y E X, we have d(x,y) > 0. 
(c) (Symmetry) For any xz, y E€ X, we have d(x,y) = d(y, £). 


(d) (Triangle inequality) For any x,y,z E€ X, we have d(x,z) < 
d(x,y) +d(y, z). 


In many cases it will be clear what the metric d is, and we shall 
abbreviate (X,d) as just X. 


12.1. Definitions and examples 391 


Remark 12.1.3. The conditions (a) and (b) can be rephrased as 
follows: for any x,y E€ X we have d(x,y) = 0 if and only if x = y. 
(Why is this equivalent to (a) and (b)?) 


Example 12.1.4 (The real line). Let R be the real numbers, and 
let d : R x R — (0, 00) be the metric d(x, y) := |x — y| mentioned 
earlier. Then (R, d) is a metric space (Exercise 12.1.2). We refer 
to d as the standard metric on R, and if we refer to R as a metric 
space, we assume that the metric is given by the standard metric 
d unless otherwise specified. 


Example 12.1.5 (Induced metric spaces). Let (X,d) be any met- 
ric space, and let Y be a subset of X. Then we can restrict the 
metric function d : X x X — [0, +00) to the subset Y xY of X xX 
to create a restricted metric function d|yxy : Y x Y — [0, +00) 
of Y; this is known as the metric on Y induced by the metric d on 
X. The pair (Y,dlyxy) is a metric space (Exercise 12.1.4) and is 
known the subspace of (X,d) induced by Y. Thus for instance the 
metric on the real line in the previous example induces a metric 
space structure on any subset of the reals, such as the integers Z, 
or an interval fa, bj, etc. 


Example 12.1.6 (Euclidean spaces). Let n > 1 be a natural 
number, and let R” be the space of n-tuples of real numbers: 


R” = 1 (015235024, Tn) ‘Piriti tn SC R}. 


We define the Euclidean metric (also called the I? metric) dj : 
R” x R” — R by 


dj2((x1, cee nh (y1, cee ,Yn)) = (x4 —_ y1)? a i (En = Yn)? 
= ($ (zi —)?)'?. 


i=l 


Thus for instance, if n = 2, then d)2((1,6), (4,2)) = V3? + 42 =5. 
This metric corresponds to the geometric distance between the 
two points (£1, £2,- .., Zn), (Y1, Y2, - - - , Yn) as given by Pythagoras’ 
theorem. (We remark however that while geometry does give some 


392 12. Metric spaces 


very important examples of metric spaces, it is possible to have 
metric spaces which have no obvious geometry whatsoever. Some 
examples are given below.) The verification that (R”, d) is indeed 
a metric space can be seen geometrically (for instance, the triangle 
inequality now asserts that the length of one side of a triangle is 
always less than or equal to the sum of the lengths of the other two 
sides), but can also be proven algebraically (see Exercise 12.1.6). 
We refer to (R”, d) as the Euclidean space of dimension n. 


Example 12.1.7 (Taxi-cab metric). Again let n > 1, and let R” 
be as before. But now we use a different metric dı, the so-called 
taxicab metric (or I! metric), defined by 


dy ((£1, £2,- -- , En), (Y1, Y2,---)Yn)) = [21 — y1! +... + |En — Yn! 
n 
= X |z; — yil- 
i=1 


Thus for instance, if n = 2, then dy ((1,6), (4,2) = 5+2 = 7. 
This metric is called the taxi-cab metric, because it models the 
distance a taxi-cab would have to traverse to get from one point to 
another if the cab was only allowed to move in cardinal directions 
(north, south, east, west) and not diagonally. As such it is always 
at least as large as the Euclidean metric, which measures distance 
“as the crow flies”, as it were. We claim that the space (R”, dyn) 
is also a metric space (Exercise 12.1.7). The metrics are not quite 
the same, but we do have the inequalities 


d(x,y) < dn (z, y) < vVndg(z, y) (12.1) 
for all x,y (see Exercise 12.1.8). 


Remark 12.1.8. The taxi-cab metric is useful in several places, 
for instance in the theory of error correcting codes. A string of n 
binary digits can be thought of as an element of R”, for instance 
the binary string 10010 can be thought of as the point (1,0, 0, 1,0) 
in R°. The taxi-cab distance between two binary strings is then 
the number of bits in the two strings which do not match, for 


12.1. Definitions and examples 393 


instance dıı (10010, 10101) = 3. The goal of error-correcting codes 
is to encode each piece of information (e.g., a letter of the alpha- 
bet) as a binary string in such a way that all the binary strings 
are as far away in the taxicab metric from each other as possi- 
ble; this minimizes the chance that any distortion of the bits due 
to random noise can accidentally change one of the coded binary 
strings to another, and also maximizes the chance that any such 
distortion can be detected and correctly repaired. 


Example 12.1.9 (Sup norm metric). Again let n > 1, and let R” 
be as before. But now we use a different metric dj, the so-called 
sup norm metric (or [© metric), defined by 


djoo((x1, £2, - A , Tn), (y1, Y2,- . ,Yn)) = sup{|zi g yil a n}. 


Thus for instance, if n = 2, then diœ ((1, 6), (4,2)) = sup(5, 2) = 7. 
The space (R”, djoo) is also a metric space (Exercise 12.1.9), and 
is related to the J? metric by the inequalities 


i 
Jn 


for all x,y (see Exercise 12.1.10). 


d(x,y) < d(x,y) < d(x,y) (12.2) 


Remark 12.1.10. The l!, 1?, and [© metrics are special cases of 
the more general IP metrics, where p E [1, +00], but we will not 
discuss these more general metrics in this text. 


Example 12.1.11 (Discrete metric). Let X be an arbitrary set 
(finite or infinite), and define the discrete metric daise by setting 
ddisc(z,y) := 0 when x = y, and ddisc(£, y) := 1 when z Æ y. 
Thus, in this metric, all points are equally far apart. The space 
(X, ddisc) is a metric space (Exercise 12.1.11). Thus every set X 
has at least one metric on it. 


Example 12.1.12 (Geodesics). (Informal) Let X be the sphere 
{(z,y,z) € RÌ : 2% +y? +2? = 1}, and let d((z, y,z), (2’,y’, z’)) 
be the length of the shortest curve in X which starts at (x,y,z) 
and ends at (z’,y’,z’). (This curve turns out to be an arc of a 


394 12. Metric spaces 


great circle; we will not prove this here, as it requires calculus of 
variations, which is beyond the scope of this text.) This makes X 
into a metric space; the reader should be able to verify (without 
using any geometry of the sphere) that the triangle inequality is 
more or less automatic from the definition. 


Example 12.1.13 (Shortest paths). (Informal) Examples of met- 
ric spaces occur all the time in real life. For instance, X could be 
all the computers currently connected to the internet, and d(z, y) 
is the shortest number of connections it would take for a packet 
to travel from computer x to computer y; for instance, if x and 
y are not directly connected, but are both connected to z, then 
d(x,y) = 2. Assuming that all computers in the internet can ul- 
timately be connected to all other computers (so that d(x,y) is 
always finite), then (X,d) is a metric space (why?). Games such 
as “six degrees of separation” are also taking place in a similar 
metric space (what is the space, and what is the metric, in this 
case?). Or, X could be a major city, and d(x,y) could be the 
shortest time it takes to drive from z to y (although this space 
might not satisfy axiom (iii) in real life!). 


Now that we have metric spaces, we can define convergence in 
these spaces. 


Definition 12.1.14 (Convergence of sequences in metric spaces). 
Let m be an integer, (X, d) be a metric space and let (x'"))°_, be 
a sequence of points in X (i.e., for every natural number n > m, 
we assume that x") is an element of X). Let x be a point in X. 
We say that (x'))°°_ converges to x with respect to the metric 
d, if and only if the limit limp... d(z™),x) exists and is equal 
to 0. In other words, (x‘"))°°_, converges to x with respect to 
d if and only if for every € > 0, there exists an N > m such 
that d(x, x) < e for all n > N. (Why are these two definitions 
equivalent? ) 


Remark 12.1.15. In view of Lemma 12.1.1 we see that this def- 
inition generalizes our existing notion of convergence of sequences 
of real numbers. In many cases, it is obvious what the metric d 


12.1. Definitions and examples 395 


is, and so we shall often just say “(x‘"))°°_ converges to x” in- 
stead of “(x'"))° _ converges to x with respect to the metric d” 
when there is no chance of confusion. We also sometimes write 
“7(") — x as n — co” instead. 


Remark 12.1.16. There is nothing special about the superscript 
n in the above definition; it is a dummy variable. Saying that 
(1) m converges to x is exactly the same statement as saying 
that (x(*))?2_, converges to x, for example; and sometimes it is 
convenient to change superscripts, for instance if the variable n 
is already being used for some other purpose. Similarly, it is not 
necessary for the sequence z”) to be denoted using the superscript 
(n); the above definition is also valid for sequences zn, or functions 
f(n), or indeed of any expression which depends on n and takes 
values in X. Finally, from Exercises 6.1.3, 6.1.4 we see that the 
starting point m of the sequence is unimportant for the purposes 
of taking limits; if (x), converges to x, then (x'))°°_, also 
converges to x for any m’ > m. 


Example 12.1.17. We work in the Euclidean space R? with the 
standard Euclidean metric dj. Let (x), denote the sequence 
a") := (1/n,1/n) in R?, i.e., we are considering the sequence 
(1,1), (1/2, 1/2), (1/3, 1/3),.... Then this sequence converges to 
(0,0) with respect to the Euclidean metric d)2, since 


f.1 1 V2 
i (n) = ji Dbe Vinny 
jim dj2(x\"’, (0,0)) = jim -7 + 7 = jim = 0. 


The sequence (a), also converges to (0,0) with respect to the 
taxi-cab metric dj, since 


lim du (x™ ,(0,0)) = lim 2 + = lim — = 0. 
n—oo n—> n n 


Similarly the sequence converges to (0,0) in the sup norm metric 
dijo (why?). However, the sequence (x™))9] does not converge 
to (0,0) in the discrete metric ddisc, since 


lim dgisc(x™, (0,0)) = lim 1=1 #0. 
n—co n—Co 


396 12. Metric spaces 


Thus the convergence of a sequence can depend on what metric 
1 


one uses’. 
In the case of the above four metrics - Euclidean, taxi-cab, sup 
norm, and discrete - it is in fact rather easy to test for convergence. 


Proposition 12.1.18 (Equivalence of l1, 17, 1°). Let R” be a 
Euclidean space, and let (z) be a sequence of points in R”. 
We write rf) = (1P, a *) a), i.e., for j = 1,2,...,n, 
ri") ER is the jt co-ordinate of cx) € R”. Let x = (x1,...,2n) 
be a point in R”. Then the following four statements are equiva- 
lent: 


(a) (x{*))%° _ converges to x with respect to the Euclidean metric 
dj2. 


(b) (1%) „ converges to x with respect to the taxi-cab metric 
dy. 


(c) (x\*))% _ converges to x with respect to the sup norm metric 


diœ. 


d) For every 1 < j < n, the sequence g) Z m converges to 
j ‘k=m 
xj. (Notice that this is a sequence of real numbers, not of 
points in R”.) 


Proof. See Exercise 12.1.12. O 


In other words, a sequence converges in the Euclidean, taxi- 
cab, or sup norm metric if and only if each of its components 
converges individually. Because of the equivalence of (a), (b) and 
(c), we say that the Euclidean, taxicab, and sup norm metrics 
on R” are equivalent. (There are infinite-dimensional analogues 


lFor a somewhat whimsical real-life example, one can give a city an “au- 
tomobile metric”, with d(x,y) defined as the time it takes for a car to drive 
from x to y, or a “pedestrian metric”, where d(x, y) is the time it takes to walk 
on foot from x to y. (Let us assume for sake of argument that these metrics 
are symmetric, though this is not always the case in real life.) One can easily 
imagine examples where two points are close in one metric but not another. 


12.1. Definitions and examples 397 


of the Euclidean, taxicab, and sup norm metrics which are not 
equivalent, see for instance Exercise 12.1.15.) 

For the discrete metric, convergence is much rarer: the se- 
quence must be eventually constant in order to converge. 


Proposition 12.1.19 (Convergence in the discrete metric). Let 
X be any set, and let dais. be the discrete metric on X. Let 
(x'"))20 m be a sequence of points in X, and let x be a point in X. 
Then (x'))°°_ converges to x with respect to the discrete metric 
ddisc tf and only if there exists an N > m such that a) = x for 
alln > N. 


Proof. See Exercise 12.1.13. o 


We now prove a basic fact about converging sequences; they 
can only converge to at most one point at a time. 


Proposition 12.1.20 (Uniqueness of limits). Let (X,d) be a met- 
ric space, and let (a‘™))°°__ be a sequence in X. Suppose that there 
are two points x,x2' E€ X such that (x) converges to x with 
respect to d, and (x\)°°__ also converges to x' with respect to d. 
Then we have x = x’. 


Proof. See Exercise 12.1.14. O 


Because of the above Proposition, it is safe to introduce the 
following notation: if (x‘™))°°_ converges to x in the metric d, 
then we write d — lim, ..2™ = x, or simply liMn—oo a” =x 
when there is no confusion as to what d is. For instance, in the 
example of (2, 1), we have 
d2 — jim (- 5) = du — lim (5, =) = (0, 0), 
+, +) is undefined. Thus the meaning of d — 
limn—oo x”) can depend on what d is; however Proposition 12.1.20 
assures us that once d is fixed, there can be at most one value of 
d — limy oo 2™. (Of course, it is still possible that this limit 
does not exist; some sequences are not convergent.) Note that 
3y Lemma 12.1.1, this definition of limit generalizes the notion of 
imit in Definition 6.1.8. 


but daisc — limp—oo( 


398 12. Metric spaces 


Remark 12.1.21. It is possible for a sequence to converge to one 
point using one metric, and another point using a different metric, 
although such examples are usually quite artificial. For instance, 
let X := [0,1], the closed interval from 0 to 1. Using the usual 
metric d, we have d — liMmn—oo + = 0. But now suppose we “swap” 
the points 0 and 1 in the following manner. Let f : [0,1] — (0, 1] 
be the function defined by f(0) := 1, f(1) := 0, and f(z) := z 
for all x € (0,1), and then define d’(z, y) := d(f (x), f(y)). Then 
(X, d’) is still a metric space (why?), but now d’ — limpsoo 2 = 1. 
Thus changing the metric on a space can greatly affect the nature 
of convergence (also called the topology) on that space; see Section 
13.5 for a further discussion of topology. 


Exercise 12.1.1. Prove Lemma 12.1.1. 


Exercise 12.1.2. Show that the real line with the metric d(z, y) := |x—y| 
is indeed a metric space. (Hint: you may wish to review your proof of 
Proposition 4.3.3.) 


Exercise 12.1.3. Let X be a set, and let d : X x X — [0,00) bea 
function. 


(a) Give an example of a pair (X,d) which obeys axioms (bcd) of 
Definition 12.1.2, but not (a). (Hint: modify the discrete metric.) 


(b) Give an example of a pair (X,d) which obeys axioms (acd) of 
Definition 12.1.2, but not (b). 


(c) Give an example of a pair (X,d) which obeys axioms (abd) of 
Definition 12.1.2, but not (c). 

(d) Give an example of a pair (X,d) which obeys axioms (abc) of 
Definition 12.1.2, but not (d). (Hint: try examples where X is a 
finite set.) 


Exercise 12.1.4. Show that the pair (Y, d|y xy ) defined in Example 12.1.5 
is indeed a metric space. 


Exercise 12.1.5. Let n > 1, and let aj,a9,...,a, and b1,b2,...,bn be 
real numbers. Verify the identity 


(Do aids)? +5 92> D laibi- abi)? = O A 8), 
i=1 


i=1 j=1 i=1 j=1 


12.1. Definitions and examples 399 


and conclude the Cauchy-Schwarz inequality 


| > aibi] < (D5 aP)/?(9  02)1?. (12.3) 
i=1 i=1 j=l 


Then use the Cauchy-Schwarz inequality to prove the triangle inequality 
n n n 
(X lai + b:)?) < A a H A). 


Exercise 12.1.6. Show that (R”, d2) in Example 12.1.6 is indeed a metric 
space. (Hint: use Exercise 12.1.5.) 


Exercise 12.1.7. Show that the pair (R”, dj: ) in Example 12.1.7 is indeed 
a metric space. 


Exercise 12.1.8. Prove the two inequalities in (12.1). (For the first 
inequality, square both sides. For the second inequality, use Exercise 
(12.1.5). 


Exercise 12.1.9. Show that the pair (R”, dœ) in Example 12.1.9 is in- 
deed a metric space. 


Exercise 12.1.10. Prove the two inequalities in (12.2). 


Exercise 12.1.11. Show that the discrete metric (R”, daise) in Example 
12.1.11 is indeed a metric space. 


Exercise 12.1.12. Prove Proposition 12.1.18. 
Exercise 12.1.13. Prove Proposition 12.1.19. 


Exercise 12.1.14. Prove Proposition 12.1.20. (Hint: modify the proof of 
Proposition 6.1.7.) 


Exercise 12.1.15. Let 


X := {(an)%29: È. lan] < œ} 


n=0 


be the space of absolutely convergent sequences. Define the l! and 1% 
metrics on this space by 


dy ((an)z=o» (bn)2=0) := X, lan — bnl; 


n=0 


diœ ((An)n—o» (bn)n=0) := Sup |an — brl. 
nE 


400 12. Metric spaces 


Show that these are both metrics on X, but show that there exist se- 
quences x1), 2(2),... of elements of X (i.e., sequences of sequences) 
which are convergent with respect to the dj. metric but not with respect 
to the dj: metric. Conversely, show that any sequence which converges 
in the dj metric automatically converges in the dœ metric. 


Exercise 12.1.16. Let (£n); and (yn); be two sequences in a metric 
space (X,d). Suppose that (£n); converges to a point z € X, and 
(Yn) 1 converges to a point y E X. Show that lim, d(fn, yn) = 
d(x,y). (Hint: use the triangle inequality several times.) 


12.2 Some point-set topology of metric spaces 


Having defined the operation of convergence on metric spaces, we 
now define a couple other related notions, including that of open 
set, closed set, interior, exterior, boundary, and adherent point. 
The study of such notions is known as point-set topology, which 
we shall return to in ‘Section 13.5. 

We first need the notion of a metric ball, or more simply a ball. 


Definition 12.2.1 (Balls). Let (X,d) be a metric space, let zo 
be a point in X, and let r > 0. We define the ball Bix a)(zo,7) in 
X, centered at xo, and with radius r, in the metric d, to be the 
set 

Bcx,a)(Z0,7) := {x E€ X : d(x, £0) < r}. 


When it is clear what the metric space (X, d) is, we shall abbre- 
viate B; x,a) (zo,7) as just B(zo,r). 


Example 12.2.2. In R° with the Euclidean metric dy, the ball 
BR? q2) (® 0), 1) is the open disc 
ry 


B R? ap) (%0), 1) = {(a,9) € R? : z? +y? <1}. 


However, if one uses the taxi-cab metric dj instead, then we obtain 
a diamond: 


Ber? g,)((0:0)s1) = {(£,9) € R? : [al + [yl < 1}. 


12.2. Some point-set topology of metric spaces 401 


If we use the discrete metric, the ball is now reduced to a single 
point: 


BR? daise) (O 0), 1) = {(0, 0)}, 


although if one increases the radius to be larger than 1, then the 
ball now encompasses all of R?. (Why?) 


Example 12.2.3. In R with the usual metric d, the open interval 
(3,7) is also the metric ball ByR a (5, 2). 


Remark 12.2.4. Note that the smaller the radius r, the smaller 
the ball B(xp,r). However, B(zo,r) always contains at least one 
point, namely the center zo, as long as r stays positive, thanks 
to Definition 12.1.2(a). (We don’t consider balls of zero radius or 
negative radius since they are rather boring, being just the empty 
set.) 


Using metric balls, one can now take a set E in a metric space 
X, and classify three types of points in X: interior, exterior, and 
boundary points of E. 


Definition 12.2.5 (Interior, exterior, boundary). Let (X,d) bea 
metric space, let E be a subset of X, and let xp be a point in X. 
We say that xp is an interior point of E if there exists a radius 
r > 0 such that B(zo,r) C E. We say that zo is an ezterior point 
of E if there exists a radius r > 0 such that B(ap,r) NE = Í. 
We say that xo is a boundary point of E if it is neither an interior 
point nor an exterior point of E. 


The set of all interior points of E is called the interior of E 
and is sometimes denoted int( E). The set of exterior points of E 
is called the exterior of E and is sometimes denoted ext( E). The 
set of boundary points of E is called the boundary of E and is 
sometimes denoted OLE. 


Remark 12.2.6. If zo is an interior point of E, then zo must 
actually be an element of E, since balls B(zo,r) always contain 
their center xp. Conversely, if xp is an exterior point of E, then 
zo cannot be an element of E. In particular it is not possible for 


402 12. Metric spaces 


zo to simultaneously be an interior and an exterior point of E. If 
Zo is a boundary point of E, then it could be an element of E, 
but it could also not lie in E; we give some examples below. 


Example 12.2.7. We work on the real line R with the standard 
metric d. Let E be the half-open interval E = [1,2). The point 
1.5 is an interior point of E, since one can find a ball (for instance 
B(1.5,0.1)) centered at 1.5 which lies in Æ. The point 3 is an 
exterior point of E, since one can find a ball (for instance B(3, 0.1)) 
centered at 3 which is disjoint from E. The points 1 and 2 however, 
are neither interior points nor exterior points of E, and are thus 
boundary points of E. Thus in this case int( E) = (1, 2), ext(£) = 
(—oo, 1) U (2,00), and OF = {1,2}. Note that in this case one of 
the boundary points is an element of E, while the other is not. 


Example 12.2.8. When we give a set X the discrete metric ddisc, 
and ÈE is any subset of X, then every element of E is an interior 
point of E, every point not contained in E is an exterior point of 
E, and there are no boundary points; see Exercise 12.2.1. 


Definition 12.2.9 (Closure). Let (X,d) be a metric space, let E 
be a subset of X, and let zp be a point in X. We say that zo is an 
adherent point of E if for every radius r > 0, the ball B(xo,r) has 
a non-empty intersection with &. The set of all adherent points 
of E is called the closure of E and is denoted E. 


Note that these notions are consistent with the corresponding 
notions on the real line defined in Definitions 9.1.8, 9.1.10 (why?). 
The following proposition links the notions of adherent point 
with interior and boundary point, and also to that of convergence. 


Proposition 12.2.10. Let (X,d) be a metric space, let E be a 
subset of X, and let xo be a point in X. Then the following state- 
ments are logically equivalent. 


(a) xo is an adherent point of E. 


(b) xo is either an interior point or a boundary point of E. 


12.2. Some point-set topology of metric spaces 403 


(c) There exists a sequence (£n) in E which converges to xo 
with respect to the metric d. 


Proof. See Exercise 12.2.2. Oo 


From the equivalence of Proposition 12.2.10(a) and (b) we ob- 
tain an immediate corollary: 


Corollary 12.2.11. Let (X,d) be a metric space, and let E be a 
subset of X. Then E = int(E) UOE = X\ext(E). 


As remarked earlier, the boundary of a set E may or may not 
lie in E. Depending on how the boundary is situated, we.may call 
a set open, closed, or neither: 


Definition 12.2.12 (Open and closed sets). Let (X,d) be a met- 
ric space, and let E be a subset of X. We say that E is closed if it 
contains all of its boundary points, i.e., OE C E. We say that E 
is open if it contains none of its boundary points, i.e., OEN E = 9. 
If E contains some of its boundary points but not others, then it 
is neither open nor closed. 


Example 12.2.13. We work in the real line R with the standard 
metric d. The set (1,2) does not contain either of its boundary 
points 1, 2 and is hence open. The set [1,2] contains both of 
its boundary points 1, 2 and is hence closed. The set [1,2) con- 
tains one of its boundary points 1, but does not contain the other 
boundary point 2, but not the other, so is neither open nor closed. 


Remark 12.2.14. It is possible for a set to be simultaneously 
open and closed, if it has no boundary. For instance, in a metric 
space (X,d), the whole space X has no boundary (every point in 
X is an interior point - why?), and so X is both open and closed. 
The empty set @ also has no boundary (every point in X is an 
exterior point - why’), and so is both open and closed. In many 
cases these are the only sets that are simultaneously open and 
closed, but there are exceptions. For instance, using the discrete 
metric ddisc, every set is both open and closed! (why?) 


404 12. Metric spaces 


From the above two remarks we see that the notions of being 
open and being closed are not negations of each other; there are 
sets that are both open and closed, and there are sets which are 
neither open and closed. Thus, if one knew for instance that E 
was not an open set, it would be erroneous to conclude from this 
that FE was a closed set, and similarly with the rôles of open and 
closed reversed. The correct relationship between open and closed 
sets is given by Proposition 12.2.15(e) below. 

Now we list some more properties of open and closed sets. 


Proposition 12.2.15 (Basic properties of open and closed sets). 
Let (X,d) be a metric space. 


(a) Let E be a subset of X. Then E is open if and only if 
E = int(E). In other words, E is open if and only if for 
every x E E, there exists an r > 0 such that B(x,r) C E. 


(b) Let E be a subset of X. Then E is closed if and only if E 
contains all its adherent points. In other words, E is closed 
if and only if for every convergent sequence (£n) m in E, 
the limit limn+o £n of that sequence also lies in E. 


(c) For any zo E X andr > 0, then the ball B(zo,r) is an open 
set. The set {x E X : d(x,x0) < r} is a closed set. (This 
set 1s sometimes called the closed ball of radius r centered at 


To.) 


(d) Any singleton set {xp}, where ro € X, is automatically 
closed. 


(e) If E is a subset of X, then E is open if and only if the 
complement X\E := {x € X : x ¢ E} is closed. 


(f) If Ei,..., En are a finite collection of open sets in X, then 
Ei NE2 N... N En is also open. If Fıì,..., Fn is a finite 
collection of closed sets in X, then Fi UFoU...U Fn is also 
closed. 


12.8. Relative topology 405 


(9) If {Eataer is a collection of open sets in X (where the index 
set I could be finite, countable, or uncountable), then the 
union Joer Ea := {x E X : x E€ Ea for some a E€ A} is also 
open. If{Fx}acı is a collection of closed sets in X, then the 
intersection (luer Fa := {x E X : x E Fy for all a € I} is 
also closed. 


(h) If E is any subset of X, then int(E) is the largest open set 
which is contained in E; in other words, int(E) is open, 
and given any other open set V C E, we have V C int(E). 
Similarly E is the smallest closed set which contains E; in 


other words, E is closed, and given any other closed set K D 
E, KDE. 


Proof. See Exercise 12.2.3. O 


Exercise 12.2.1. Verify the claims in Example 12.2.8. 


Exercise 12.2.2. Prove Proposition 12.2.10. (Hint: for some of the im- 
plications one will need the axiom of choice, as in Lemma 8.4.5.) 


Exercise 12.2.3. Prove Proposition 12.2.15. (Hint: you can use earlier 
parts of the proposition to prove later ones.) 


Exercise 12.2.4. Let (X,d) be a metric space, zo be a point in X, and 
r > 0. Let B be the open ball B := B(zo,r) = {x € X : d(z, £0) < r}, 
and let C be the closed ball C := {x € X : d(x, 20) < r}. 


(a) Show that BCC. 


(b) Give an example of a metric space (X, d), a point zo, and a radius 
r > 0 such that B is not equal to C. 


12.3 Relative topology 


When we defined notions such as open and closed sets, we men- 
tioned that such concepts depended on the choice of metric one 
uses. For instance, on the real line R, if one uses the usual metric 
d(x,y) = |x — y|, then the set {1} is not open, however if instead 
one uses the discrete metric ddisc, then {1} is now an open set 
(why?). 


406 12. Metric spaces 


However, it is not just the choice of metric which determines 
what is open and what is not - it is also the choice of ambient 
space X. Here are some examples. 


Example 12.3.1. Consider the plane R? with the Euclidean met- 
ric djz. Inside the plane, we can find the z-axis X := {(z,0): £ € 
R}. The metric dz can be restricted to X, creating a subspace 
(X,delxxx) of (R°, dz). (This subspace is essentially the same 
as the real line (R,d) with the usual metric; the precise way of 
stating this is that (X, dj2|x xx) is isometric to (R,d). We will not 
pursue this concept further in this text, however.) Now consider 
the set 
E := {(z,0):-l<2<1} 


which is both a subset of X and of R?. Viewed as a subset of R?, 

it is not open, because the point 0, for instance, lies in E but is 

not an interior point of E. (Any ball BR? da (0,r) will contain at 
ed 


least one point that lies outside of the z-axis, and hence outside 
of E. On the other hand, if viewed as a subset of X, it is open; 
every point of E is an interior point of E with respect to the metric 
space (X,d,2|xxx). For instance, the point 0 is now an interior 
point of E, because the ball By.¢.|x.x (0,1) is contained in E (in 
fact, in this case it is E.) 


Example 12.3.2. Consider the real line R with the standard 
metric d, and let X be the interval X := (—1,1) contained inside 
R; we can then restrict the metric d to X, creating a subspace 
(X,d|x xx) of (R,d). Now consider the set [0,1). This set is not 
closed in R, because the point 1 is adherent to [0,1) but is not 
contained in [0,1). However, when considered as a subset of X, 
the set [0, 1) now becomes closed; the point 1 is not an element of 
X and so is no longer considered an adherent point of [0, 1), and 
so now [0, 1) contains all of its adherent points. 


To clarify this distinction, we make a definition. 


Definition 12.3.3 (Relative topology). Let (X,d) be a metric 
space, let Y be a subset of X, and let E be a subset of Y. We say 


12.8. Relative topology 407 


that E is relatively open with respect to Y if it is open in the metric 
subspace (Y,dlyxy). Similarly, we say that E is relatively closed 
with respect to Y if it is closed in the metric space (Y, dlyxy). 


The relationship between open (or closed) sets in X, and rel- 
atively open (or relatively closed) sets in Y, is the following. 


Proposition 12.3.4. Let (X,d) be a metric space, let Y be a 
subset of X, and let E be a subset of Y. 


(a) E is relatively open with respect to Y if and only if E = VOY 
for some set V C X which is open in X. 


(b) E is relatively closed with respect to Y if and only if E = 
KAY for some set K C X which is closed in X. 


Proof. We just prove (a), and leave (b) to Exercise 12.3.1. First 
suppose that E is relatively open with respect to Y. Then, E 
is open in the metric space (Y,dly,xy). Thus, for every zx € E, 
there exists a radius r > 0 such that the ball Bry ajy xy) (£r) is 
contained in Æ. This radius r depends on zx; to emphasize this we 
write rz instead of r, thus for every x € E the ball Biy djy yy) (T, Tz) 
is contained in E. (Note that we have used the axiom of choice, 
Proposition 8.4.7, to do this.) 
Now consider the set 


V := LU Bixa (2, rz). 


reek 


This is a subset of X. By Proposition 12.2.15(c) and (g), V is 
open. Now we prove that E = VOY. Certainly any point z in E 
lies in V QY, since it lies in Y and it also lies in B(x q)(Z, rz), and 
hence in V. Now suppose that y is a point in VM Y. Then 
y € V, which implies that there exists an x € E such that 
y E€ Bxa)(z,rz). But since y is also in Y, this implies that 
y E€ Byay,y)(£,rz). But by definition of rs, this means that 
y E€ E, as desired. Thus we have found an open set V for which 
E= V NY as desired. 


408 12. Metric spaces 


Now we do the converse. Suppose that E = VM Y for some 
open set V; we have to show that F is relatively open with respect 
to Y. Let x be any point in E; we have to show that x is an interior 
point of E in the metric space (Y, d|yxy). Since z € E, we know 
x E€ V. Since V is open in X, we know that there is a radius 
r > 0 such that Bix a)(z,7r) is contained in V. Strictly speaking, 
r depends on zx, and so we could write r; instead of r, but for this 
argument we will only use a single choice of x (as opposed to the 
argument in the previous paragraph) and so we will not bother to 
subscript r here. Since E = VAY, this means that Bix 4)(z,r)NY 
is contained in E. But Bix a(z,r) NY is exactly the same as 
Brydly,y)(£7) (why?), and so Bry.ay,,)(Z,7) is contained in E. 
Thus z is an interior point of E in the metric space (Y,dly xy), as 
desired. 


Exercise 12.3.1. Prove Proposition 12.3.4(b). 


12.4 Cauchy sequences and complete metric spaces 


We now generalize much of the theory of limits of sequences from 
Chapter 6 to the setting of general metric spaces. We begin by 
generalizing the notion of a subsequence from Definition 6.6.1: 


Definition 12.4.1 (Subsequences). Suppose that (r‘))°%_ is 
a sequence of points in a metric space (X,d). Suppose that 
N1,N2,N3,... is an increasing sequence of integers which are at 
least as large as m, thus 


msn<ng<ng<.... 


Then we call the sequence (x inj)) 20 ©, a subsequence of the original 
sequence (x'"))oo__ 


Examples 12.4.2. the sequence (Gz 92) $2 ©, in R? is a subse- 


quence of the sequence ((4, 2))2°, (in this case, nj := j°). The 
sequence 1,1,1,1,... is a subsequence of 1,0,1,0,1,. 


12.4. Cauchy sequences and complete metric spaces 409 


If a sequence converges, then so do all of its subsequences: 


Lemma 12.4.3. Let (1™)%®, be a sequence in (X,d) which con- 
verges to some limit xp. Then every subsequence (z Dp of that 
sequence also converges to Zo. 


Proof. See Exercise 12.4.3. o 


On the other hand, it is possible for a subsequence to be con- 
vergent without the sequence as a whole being convergent. For ex- 
ample, the sequence 1,0,1,0,1,... is not convergent, even though 
certain subsequences of it (such as 1,1,1, ...) converge. To quan- 
tify this phenomenon, we generalize Definition 6.4.1 as follows: 


Definition 12.4.4 (Limit points). Suppose that (x'”)°°_ is a 
sequence of points in a metric space (X,d), and let L € X. We 
say that L is a limit point of (a‘™)°°__, iff for every N > m and 
e > 0 there exists an n > N such that d(z™), L) < e. 


Proposition 12.4.5. Let (1™)®, be a sequence of points in 
a metric space (X,d), and let L € X. Then the following are 
equivalent: 


o Lis a limit point of (1™)® n. 


e There exists a subsequence (a's 205 of the original sequence 


(z™))® «which converges to L. 


Proof. See Exercise 12.4.2. oO 


Next, we review the notion of a Cauchy sequence from Defini- 
tion 6.1.3 (see also Definition 5.1.8). 


Definition 12.4.6 (Cauchy sequences). Let (1™)9,, be a se- 
quence of points in a metric space (X,d). We say that this se- 
quence is a Cauchy sequence iff for every € > 0, there exists an 
N > m such that d(x), 2) < e for all j,k > N. 


Lemma 12.4.7 (Convergent sequences are Cauchy sequences). 
Let (x\"))%__ be a sequence in (X,d) which converges to some 


limit zp. Then (x™)®, is also a Cauchy sequence. 


410 12. Metric spaces 


Proof. See Exercise 12.4.3. O 


It is also easy to check that subsequence of a Cauchy sequence 
is also a Cauchy sequence (why)? However, not every Cauchy 
sequence converges: 


Example 12.4.8. (Informal) Consider the sequence 
3, 3.1, 3.14, 3.141, 3.1415, ... 


in the metric space (Q, d) (the rationals Q with the usual metric 
d(x,y) := |x — y|). While this sequence is convergent in R (it 
converges to 7), it does not converge in Q (since 7 ¢ Q, and a 
sequence cannot converge to two different limits). 


So in certain metric spaces, Cauchy sequences do not nec- 
essarily converge. However, if even part of a Cauchy sequence 
converges, then the entire Cauchy sequence must converge (to the 
same limit): 


Lemma 12.4.9. Let (x), be a Cauchy sequence in (X,d). 
Suppose that there is some subsequence (alns))% | of this sequence 
which converges to a limit xo in X. Then the original sequence 


(x™))®_, also converges to zo. 


Proof. See Exercise 12.4.4. O 


In Example 12.4.8 we saw an example of a metric space which 
contained Cauchy sequences which did not converge. However, 
in Theorem 6.4.18 we saw that in the metric space (R, d), every 
Cauchy sequence did have a limit. This motivates the following 
definition. 


Definition 12.4.10 (Complete metric spaces). A metric space 
(X,d) is said to be complete iff every Cauchy sequence in (X,d) 
is in fact convergent in (X,d). 


Example 12.4.11. By Theorem 6.4.18, the reals (R, d) are com- 
plete; by Example 12.4.8, the rationals (Q, d), on the other hand, 
are not complete. 


12.4. Cauchy sequences and complete metric spaces 411 


Complete metric spaces have some nice properties. For in- 
stance, they are intrinsically closed: no matter what space one 
places them in, they are always closed sets. More precisely: 


Proposition 12.4.12. (a) Let (X,d) be a metric space, and let 
(Y, dlyxy) be a subspace of (X,d). If (Y,dlyxy) is complete, 
then Y must be closed in X. 


(b) Conversely, suppose that (X,d) is a complete metric space, 
and Y is a closed subset of X. Then the subspace (Y,dlyxy) 
is also complete. 


Proof. See Exercise 12.4.7. oO 


In contrast, an incomplete metric space such as (Q, d) may be 
considered closed in some spaces (for instance, Q is closed in Q) 
but not in others (for instance, Q is not closed in R). Indeed, 
it turns out that given any incomplete metric space (X,d), there 
exists a completion (X,d), which is a larger metric space contain- 
ing (X,d) which is complete, and such that X is not closed in X 
(indeed, the closure of X in (X,d) will be all of X); see Exercise 
12.4.8. For instance, one possible completion of Q is R. 


Exercise 12.4.1. Prove Lemma 12.4.3. (Hint: review your proof of 
Proposition 6.6.5.) 


Exercise 12.4.2. Prove Proposition 12.4.5. (Hint: review your proof of 
Proposition 6.6.6.) 


Exercise 12.4.3. Prove Lemma 12.4.7. (Hint: review your proof of 
Proposition 6.1.12.) 


Exercise 12.4.4. Prove Lemma 12.4.9. 


Exercise 12.4.5. Let (2'™)°°_ be a sequence of points in a metric space 
(X,d), and let L E€ X. Show that if L is a limit point of the sequence 
(£1) n, then L is an adherent point of the set {z : n > m}. Is the 
converse true? 


Exercise 12.4.6. Show that every Cauchy sequence can have at most one 
limit point. 


Exercise 12.4.7. Prove Proposition 12.4.12. 


412 12. Metric spaces 


Exercise 12.4.8. The following construction generalizes the construction 
of the reals from the rationals in Chapter 5, allowing one to view any 
metric space as a subspace of a complete metric space. In what follows 
we let (X,d) be a metric space. 


(a) Given any Cauchy sequence (£n); in X, we introduce the formal 
limit LIMn-.02n- We say that two formal limits LIM,4..02n 
and LIMn-.oo%n are equal if limp—oo d(fn, Yn) is equal to zero. 
Show that this equality relation obeys the reflexive, symmetry, 


and transitive axioms. 
(b) Let X be the space of all formal limits of Cauchy sequences in X, 
with the above equality relation. Define a metric dy: XxX 


R* by setting 
dx(LIMncotn, LIMnco¥n) := lim dEn, Yn). 


N 


Show that this function is well-defined (this means not only that 
the limit limp p—oo d(@n, Yn) exists, but also that the axiom of sub- 
stitution is obeyed; cf. Lemma 5.3.7), and gives X the structure 
of a metric space. 


(c) Show that the metric space (X, dẹ) is complete. 


(d) We identify an element x € X with the corresponding formal limit 
LIM,,+0of in X; show that this is legitimate by verifying that 
zr =y <— LIMn=œx = LIMn=œy. With this identification, 
show that d(x, y) = dy(z, y), and thus (X, d) can now be thought 


of as a subspace of (X, dx) 


(e) Show that the closure of X in X is X (which explains the choice 
of notation X). 


(f) Show that the formal limit agrees with the actual limit, thus if 
(£n)n—= is any Cauchy sequence in X, then we have limp_.o0 fn, = 
LIM,-—cofn in X. 


~~ 


12.5 Compact metric spaces 


We now come to one of the most useful notions in point set topol- 
ogy, that of compactness. Recall the Heine-Borel theorem (The- 
orem 9.1.24), which asserted that every sequence in a closed and 
bounded subset X of the real line R had a convergent subse- 
quence whose limit was also in X. Conversely, only the closed 


12.5. Compact metric spaces 413 


and bounded sets have this property. This property turns out to 
be so useful that we give it a name. 


Definition 12.5.1 (Compactness). A metric space (X,d) is said 
to be compact iff every sequence in (X,d) has at least one conver- 
gent subsequence. A subset Y of a metric space X is said to be 
compact if the subspace (Y, dly,y) is compact. 


Remark 12.5.2. The notion of aset Y being compact is intrinsic, 
in the sense that it only depends on the metric function dlyxy 
restricted to Y, and not on the choice of the ambient space X. The 
notions of completeness in Definition 12.4.10, and of boundedness 
below in Definition 12.5.3, are also intrinsic, but the notions of 
open and closed are not (see the discussion in Section 12.3). 


Thus, Theorem 9.1.24 shows that in the real line R with the 
usual metric, every closed and bounded set is compact, and con- 
versely every compact set is closed and bounded. 

Now we investigate how the Heine-Borel extends to other met- 
ric spaces. 


Definition 12.5.3 (Bounded sets). Let (X,d) be a metric space, 
and let Y be a subset of X. We say that Y is bounded iff there 
exists a ball B(z,r) in X which contains Y. 


Remark 12.5.4. This definition is compatible with the definition 
of a bounded set in Definition 9.1.22 (Exercise 12.5.1). 


Proposition 12.5.5. Let (X,d) be a compact metric space. Then 
(X,d) is both complete and bounded. 


Proof. See Exercise 12.5.2. 0O 


From this proposition and Proposition 12.4.12(a) we obtain 
one half of the Heine-Borel theorem for general metric spaces: 


Corollary 12.5.6 (Compact sets are closed and bounded). Let 
(X,d) be a metric space, and let Y be a compact subset of X. 
Then Y is closed and bounded. 


414 12. Metric spaces 


The other half of the Heine-Borel theorem is true in Euclidean 
spaces: 


Theorem 12.5.7 (Heine-Borel theorem). Let (R”,d) be a Euclid- 
ean space with either the Euclidean metric, the taxicab metric, or 
the sup norm metric. Let E be a subset of R”. Then E is compact 
if and only if it is closed and bounded. 


Proof. See Exercise 12.5.3. O 


However, the Heine-Borel theorem is not true for more gen- 
eral metrics. For instance, the integers Z with the discrete metric 
is closed (indeed, it is complete) and bounded, but not compact, 
since the sequence 1, 2,3, 4,... is in Z but has no convergent sub- 
sequence (why?). Another example is in Exercise 12.5.8. However, 
a version of the Heine-Borel theorem is available if one is willing 
to replace closedness with the stronger notion of completeness, 
and boundedness with the stronger notion of total boundedness; 
see Exercise 12.5.10. 

One can characterize compactness topologically via the fol- 
lowing, rather strange-sounding statement: every open cover of a 
compact set has a finite subcover. 


Theorem 12.5.8. Let (X,d) be a metric space, and let Y be a 
compact subset of X. Let (Va)aer be a collection of open sets in 


X, and suppose that 
Y c |] Va. 


ael 


(i.e., the collection (Va)aer covers Y). ee there exists a finite 
subset F of I such that 


Yc (va 


aEF 


Proof. We assume for sake of contradiction that there does not 
exist any finite subset F of A for which Y C User Va- 

Let y be any element of Y. Then y must lie in at least one 
of the sets Va. Since each V, is open, there must therefore be 


19.5. Compact metric spaces 415 


an r > 0 such that Bixa (yr) C Va. Now let r(y) denote the 
quantity 


r(y) := sup{r € (0,00) : Bixa) (yr) E Va for some a € A}. 


By the above discussion, we know that r(y) > 0 for all y € Y. 
Now, let ro denote the quantity 


ro := inf{r(y):y EY}. 


Since r(y) > 0 for all y € Y, we have rp > 0. There are two cases: 
ro = 0 and rọ > 0. 


e Case 1: ro = 0. Then for every integer n > 1, there is at 
least one point y in Y such that r(y) < 1/n (why?). We thus 
choose, for each n > 1, a point y) in Y such that r(y™)) < 
1/n (we can do this because of the axiom of choice, see 
Proposition 8.4.7). In particular we have limp_.oo r(y™) = 
0, by the squeeze test. The sequence (y) is a sequence 
in Y; since Y is compact, we can thus find a subsequence 
(y) which converges to a point yo € Y. 


As before, we know that there exists some a € J such that 
yo E Vo, and hence (since Vy is open) there exists some 
€ > 0 such that B(yo,e) C Va. Since y\™) converges to yo, 
there must exist an N > 1 such that y(™ € B(yo,e/2) for 
all n > N. In particular, by the triangle inequality we have 
B(y™,¢/2) C B(yo,e), and thus B(y™,«/2) C Va. By 
definition of r(y)), this implies that r(y() > €/2 for all 
n > N. But this contradicts the fact that lim;_,.. r(y™) = 
0. 


e Case 2: ro > 0. In this case we now have r(y) > ro/2 for 
all y € Y. This implies that for every y € Y there exists an 
a E€ A such that B(y,ro/2) € Va (why?). 


We now construct a sequence y),y(2)... by the follow- 
ing recursive procedure. We let y‘!) be any point in Y. 
The ball B(y“),r9/2) is contained in one of the V, and 


416 12. Metric spaces 


thus cannot cover all of Y, since we would then obtain 
a finite cover, a contradiction. Thus there exists a point 
y(?) which does not lie in B(y),r9/2), so in particular 
d(y?), y) > 19/2. Choose such a point y(?). The set 
B(y), r9/2) U B(y®, r9/2) cannot cover all of Y, since we 
would then obtain two sets Vy, and Va, which covered Y, 
a contradiction again. So we can choose a point y(3) which 
does not lie in B(y“),r9/2) U B(y®), 9/2), so in particular 
d(yS), y) > ro/2 and d(y®), y) > ro/2. Continuing in 
this fashion we obtain a sequence (y))°, in Y with the 
property that d(y“), y) > r9/2 for all k > j. In particular 
the sequence (yor, is not a Cauchy sequence, and in fact 
no subsequence of (y‘"))°°, can be a Cauchy sequence ei- 
ther. But this contradicts the assumption that Y is compact 
(by Lemma, 12.4.7). 


E 


It turns out that Theorem 12.5.8 has a converse: if Y has 
the property that every open cover has a finite sub-cover, then 
it is compact (Exercise 12.5.11). In fact, this property is often 
considered the more fundamental notion of compactness than the 
sequence-based one. (For metric spaces, the two notions, that 
of compactness and sequential compactness, are equivalent, but 
for more general topological spaces, the two notions are slightly 
different; see Exercise 13.5.8.) 

Theorem 12.5.8 has an important corollary: that every nested 
sequence of non-empty compact sets is still non-empty. 


Corollary 12.5.9. Let (X,d) be a metric space, and let Kı, Ko, K3 
be a sequence of non-empty compact subsets of X such that 


Ki > Ko K3D.... 
Then the intersection (\>-_, Kn is non-empty. 


Proof. See Exercise 12.5.6. 0O 


19.5. Compact metric spaces 417 


We close this section by listing some miscellaneous properties 
of compact sets. 


Theorem 12.5.10. Let (X,d) be a metric space. 


(a) IfY is a compact subset of X, and Z CY, then Z is compact 
if and only if Z is closed. 


(b) If Yi,..., Yn are a finite collection of compact subsets of X, 
then their union Yı U... U Yn is also compact. 


(c) Every finite subset of X (including the empty set) is com- 
pact. 


Proof. See Exercise 12.5.7. go 


Exercise 12.5.1. Show that Definitions 9.1.22 and 12.5.3 match when 
talking about subsets of the real line with the standard metric. 


Exercise 12.5.2. Prove Proposition 12.5.5. (Hint: prove the complete- 
ness and boundedness separately. For both claims, use proof by contra- 
diction. You will need the axiom of choice, as in Lemma 8.4.5.) 


Exercise 12.5.3. Prove Theorem 12.5.7. (Hint: use Proposition 12.1.18 
and Theorem 9.1.24.) 


Exercise 12.5.4. Let (R,d) be the real line with the standard metric. 
Give an example of a continuous function f : R — R, and an open set 
V CR, such that the image f(V) := {f(x) : 2 € V} of V is not open. 


Exercise 12.5.5. Let (R,d) be the real line with the standard metric. 
Give an example of a continuous function f : R — R, and a closed set 
F CR, such that f(F) is not closed. 


Exercise 12.5.6. Prove Corollary 12.5.9. (Hint: work in the compact 
metric space (K,d|x,xx,), and consider the sets Vp, := K,\Kn, which 
are open on Kı. Assume for sake of contradiction that 2; Kn = 9, 
and then apply Theorem 12.5.8.) 


Exercise 12.5.7. Prove Theorem 12.5.10. (Hint: for part (c), you may 
wish to use (b), and first prove that every singleton set is compact.) 


418 12. Metric spaces 


Exercise 12.5.8. Let (X,dj) be the metric space from Exercise 12.1.15. 
For each natural number n, let e) = Curr be the sequence in X 


such that ei”) := 1 when n = j and ei”) := 0 when n Æ j. Show that 


the set {e'") : n e N} is a closed and bounded subset of X, but is not 
compact. (This is despite the fact that (X, dj) is even a complete metric 
space - a fact which we will not prove here. The problem is that not 
that X is incomplete, but rather that it is “infinite-dimensional”, in a 
sense that we will not discuss here.) 


Exercise 12.5.9. Show that a metric space (X,d) is compact if and only 
if every sequence in X has at least one limit point. 


Exercise 12.5.10. A metric space (X,d) is called totally bounded if for 
every € > 0, there exists a positive integer n and a finite number of balls 
B(x), e),..., B(x, £) which cover X (i.e., X = U; B(x™,€). 


(a) Show that every totally bounded space is bounded. 


(b) Show the following stronger version of Proposition 12.5.5: if (X, d) 
is compact, then complete and totally bounded. (Hint: if X is not 
totally bounded, then there is some £ > 0 such that X cannot be 
covered by finitely many e-balls. Then use Exercise 8.5.20 to find 
an infinite sequence of balls B(x"), <¢/2) which are disjoint from 
each other. Use this to then construct a sequence which has no 
convergent subsequence. ) 


(c) Conversely, show that if X is complete and totally bounded, then 
X is compact. (Hint: if (2'"))°°, is a sequence in X, use the 
total boundedness hypothesis to recursively construct a sequence 
of subsequences (2'"J))°°_, of (a("))°°_, for each positive integer j, 
such that for each 7, the « lenient of the sequence (s3 ) joS >] are 
contained in a single ball of radius 1/7, and also that each sequence 
(a(1J+1))°° | is a subsequence of the previous one (x'"3))%_,. 
Then show that the “diagonal” sequence (x‘"i"))°°_, is a Cauchy 
sequence, and then use the completeness hypothesis.) 


Exercise 12.5.11. Let (X,d) have the property that every open cover of 
X has a finite subcover. Show that X is compact. (Hint: if X is not 
compact, then by Exercise 12.5.9, there is a sequence (x'"))°°, with no 
limit points. Then for every x € X there exists a ball B(x, €) containing 
x which contains at most finitely many elements of this sequence. Now 
use the hypothesis. ) 


Exercise 12.5.12. Let (X, ddisc) be a metric space with the discrete met- 
ric daise: 


19.5. Compact metric spaces 419 


(a) Show that X is always complete. 


b) When is X compact, and when is X not compact? Prove your 
claim. (Hint: the Heine-Borel theorem will be useless here since 
that only applies to Euclidean spaces with the Euclidean metric.) 


Erercise 12.5.13. Let E and F be two compact subsets of R (with the 
standard metric d(x,y) = |x — y|). Show that the Cartesian product 
E x F := {(x,y) : x € E,y € F} is a compact subset of R? (with the 
Euclidean metric dz). 


Exercise 12.5.14. Let (X,d) be a metric space, let E be a non-empty 
compact subset of X, and let zo be a point in X. Show that there exists 
a point x E E such that 


d(x, £) g inf {d(xo, y) :y E E}, 


ie., z is the closest point in EF to zo. (Hint: let R be the quantity 
R := inf{d(z0,y) : y € E}. Construct a sequence (x2'"))°°, in Æ such 
that d(£zo, £™) < R + Ł, and then use the compactness of E.) 


Exercise 12.5.15. Let (X,d) be a compact metric space. Suppose that 
(Ka)aer is a collection of closed sets in X with the property that any 
finite subcollection of these sets necessarily has non-empty intersection, 
thus Macr Ka # 9 for all finite F C I. (This property is known as the 
finite intersection property.) Show that the entire collection has non- 
empty intersection, thus (),<; Ka = 0. Show that by counterexample 
this statement fails if X is not compact. 


Chapter 13 


Continuous functions on metric spaces 


13.1 Continuous functions 


In the previous chapter we studied a single metric space (X,d), 
and the various types of sets one could find in that space. While 
this is already quite a rich subject, the theory of metric spaces 
becomes even richer, and of more importance to analysis, when one 
considers not just a single metric space, but rather pairs (X, dx) 
and (Y,dy) of metric spaces, as well as continuous functions f : 
X — Y between such spaces. To define this concept, we generalize 
Definition 9.4.1 as follows: 


Definition 13.1.1 (Continuous functions). Let (X, dx) be a met- 
ric space, and let (Y, dy) be another metric space, and let f : X — 
Y be a function. If zo E€ X, we say that f is continuous at xo iff 
for every € > 0, there exists a ô > 0 such that dy (f(x), f(£o)) < € 
whenever dx(z,%o) < 6. We say that f is continuous iff it is 
continuous at every point x € X. 


Remark 13.1.2. Continuous functions are also sometimes called 
continuous maps. Mathematically, there is no distinction between 
the two terminologies. 


Remark 13.1.3. If f : X — Y is continuous, and K is any 
subset of X, then the restriction f|x : K — Y of f to K is also 
continuous (why’). 


13.1. Continuous functions 421 


We now generalize much of the discussion in Chapter 9. We 
first observe that continuous functions preserve convergence: 


Theorem 13.1.4 (Continuity preserves convergence). Suppose 
that (X,dx) and (Y,dy) are metric spaces. Let f: X —Y be 
a function, and let xo € X be a point in X. Then the following 
three statements are logically equivalent: 


(a) f is continuous at Zo. 


(b) Whenever (s) is a sequence in X which converges to 
zo with respect to the metric dx, the sequence (f (£); 
converges to f(x) with respect to the metric dy. 


(c) For every open set V C Y that contains f(xo), there exists 
an open set U C X containing xp such that f(U) CV. 


Proof. See Exercise 13.1.1. o 


Another important characterization of continuous functions in- 
volves open sets. 


Theorem 13.1.5. Let (X,dx) be a metric space, and let (Y, dy) 
be another metric space. Let f: X —Y be a function. Then the 
following four statements are equivalent: 


(a) f is continuous. 


(b) Whenever (x'"))°%, is a sequence in X which converges to 
some point xo E X with respect to the metric dx, the se- 
quence (f(x'")))2°., converges to f(xo) with respect to the 
metric dy. 


(c) Whenever V is an open set in Y, the set f-1(V) := {x € 
X : f(x) € V} is an open set in X. 


(d) Whenever F is a closed set in Y, the set f *(F) := {x € 
X : f(x) € F} is a closed set in X. 


Proof. See Exercise 13.1.2. 0O 


422 13. Continuous functions on metric spaces 


Remark 13.1.6. It may seem strange that continuity ensures 
that the inverse image of an open set is open. One may guess 
instead that the reverse should be true, that the forward image 
of an open set is open; but this is not true; see Exercises 12.5.4, 
12.5.5. 


As a quick corollary of the above two Theorems we obtain 


Corollary 13.1.7 (Continuity preserved by composition). Let 
(X,dx), (Y,dy), and (Z,dz) be metric spaces. 


(a) Iff:X —-Y is continuous at a point xo E€ X, and g : Y — 
Z is continuous at f(xo), then the composition go f : X —> 
Z, defined by g o f(x) := g(f(x)), is continuous at zo. 


(b) If f : X —Y is continuous, and g : Y — Z is continuous, 
then gof:X — Z is also continuous. 


Proof. See Exercise 13.1.3. 0O 


Example 13.1.8. If f : X — R is a continuous function, then the 
function f? : X — R defined by f?(x) := f(z)? is automatically 
continuous also. This is because we have f? = g o f, where g : 
R — R is the squaring function g(x) := x’, and g is a continuous 
function. 


Exercise 13.1.1. Prove Theorem 13.1.4. (Hint: review your proof of 
Proposition 9.4.7.) 


Exercise 13.1.2. Prove Theorem 13.1.5. (Hint: Theorem 13.1.4 already 
shows that (a) and (b) are equivalent.) 


Exercise 13.1.3. Use Theorem 13.1.4 and Theorem 13.1.5 to prove Corol- 
lary 13.1.7. 


Exercise 13.1.4. Give an example of functions f : R-- Randg:R-—R 
such that 


(a) f is not continuous, but g and go f are continuous; 
(b) g is not continuous, but f and go f are continuous; 


(c) f and g are not continuous, but go f is continuous. 


13.2. Continuity and product spaces 423 


Explain briefly why these examples do not contradict Corollary 13.1.7. 
Exercise 13.1.5. Let (X,d) be a metric space, and let (E,d|z.2) bea 
subspace of (X,d). Let ug.x : E — X be the inclusion map, defined 
by setting 424x(x) := x for all x € E. Show that zx is continuous. 
Ezercise 13.1.6. Let f : X — Y be a function from one metric space 
(X,dx) to another (Y, dy). Let E be a subset of X (which we give the 
induced metric dx|zx#), and let f|g : E — Y be the restriction of f 
to E, thus f|e(x) := f(x) when z € E. If zo E€ E and f is continuous 
at Zo, show that f|g is also continuous at 29. (Is the converse of this 
statement true? Explain.) Conclude that if f is continuous, then f|z 
is continuous. Thus restriction of the domain of a function does not 
destroy continuity. (Hint: use Exercise 13.1.5.) 

Exercise 13.1.7. Let f : X — Y be a function from one metric space 
(X,dx) to another (Y, dy). Suppose that the image f(X) of X is con- 
tained in some subset EF C Y of Y. Let g: X — E be the function 
which is the same as f but with the range restricted from Y to E, thus 
g(x) = f(x) for all x € X. We give E the metric dy|z x induced from 
Y. Show that for any zo E€ X, that f is continuous at zo if and only if 
g is continuous at xo. Conclude that f is continuous if and only if g is 
continuous. (Thus the notion of continuity is not affected if one restricts 
the range of the function.) 


13.2 Continuity and product spaces 


Given two functions f : X — Y and g : X — Z, one can define 
their direct sum f ®g: X — Y x Z defined by f @ g(x) := 
(f(x), g(x)), i.e., this is the function taking values in the Cartesian 
product Y x Z whose first co-ordinate is f(x) and whose second co- 
ordinate is g(x) (cf. Exercise 3.5.7). For instance, if f : R — Ris 
the function f(z) := z? +3, and g : R — Ris the function g(x) = 
4x, then f ®g : R — R? is the function f @ g(x) := (x? + 3, 42). 
The direct sum operation preserves continuity: 


Lemma 13.2.1. Let f: X — R and g : X — R be functions, 


and let f @g : X — R? be their direct sum. We give R? the 
Euclidean metric. 


(a) If x9 E X, then f and g are both continuous at xo if and 
only if f Dg is continuous at xo. 


424 13. Continuous functions on metric spaces 


(b) f andg are both continuous if and only if fg is continuous. 
Proof. See Exercise 13.2.1. 0O 
To use this, we first need another continuity result: 


Lemma 13.2.2. The addition function (x,y) + x+y, the subtrac- 
tion function (x,y) + x — y, the multiplication function (z, y) > 
zy, the marimum function (x,y) ++ max(z, y), and the minimum 
function (x,y) + min(z, y), are all continuous functions from R? 
to R. The division function (x,y) > x/y is a continuous function 
from R x (R\{0}) = {(z,y) € R? : y # 0} to R. For any real 
number c, the function x +> cx is a continuous function from R. 
to R. 


Proof. See Exercise 13.2.2. o 
Combining these lemmas we obtain 


Corollary 13.2.3. Let (X,d) be a metric space, let f : X > R 
and g : X — R be functions. Let c be a real number. 


(a) If xo E X and f and g are continuous at xo, then the func- 
tions f+g:X OR, f-g:X — R, fg: X OR, 
max(f,g) : X > R, min(f,g): X > R, andcf:X >R 
(see Definition 9.2.1 for definitions) are also continuous at 
xo. If g(x) #0 for alla E€ X, then f/g : X — R is also 


continuous at Xo. 


(b) If f and g are continuous, then the functions f +g : X —> 
R, f—g: X >R, fg: X >R, max(f,9):X >R, 
min(f, g): X —> R, and cf : X — R are also continuous at 
xo. If g(x) #0 for alla E X, then f/g : X — R is also 


continuous at Xo. 


Proof. We first prove (a). Since f and g are continucus at zo, then 
by Lemma 13.2.1 f @g : X — R? is also continuous at xp. On 
the other hand, from Lemma 13.2.2 the function (x,y) +> x+y is 
continuous at every point in Rĉ, and in particular is continuous at 
f ®g(xo). If we then compose these two functions using Corollary 


13.2. Continuity and product spaces 425 


13.1.7 we conclude that f +g: X — R is continuous. A similar 
argument gives the continuity of f — g, fg, max(f,g), min(f, g) 
and cf. To prove the claim for f/g, we first use Exercise 13.1.7 to 
restrict the range of g from R to R\{0}, and then one can argue 
as before. The claim (b) follows immediately from (a). O 


This corollary allows us to demonstrate the continuity of a 
large class of functions; we give some examples below. 


Exercise 13.2.1. Prove Lemma 13.2.1. (Hint: use Proposition 12.1.18 
and Theorem 13.1.4.) 


Exercise 13.2.2. Prove Lemma 13.2.2. (Hint: use Theorem 13.1.5 and 
limit laws (Theorem 6.1.19).) 


Exercise 13.2.3. Show that if f : X — R is a continuous function, so is 
the function |f|: X — R defined by |f|(x) := |f(z)]. 


Erercise 13.2.4. Let mı : R? — R and mT : R? — R be the functions 
mı(x, y) := x and m2(x, y) := y (these two functions are sometimes called 
the co-ordinate functions on R“). Show that mı and 72 are continuous. 
Conclude that if f : R — X is any continuous function into a metric 
space (X, d), then the functions gı : R? + X and gz: R? — X defined 
by gı(x,y) := f(x) and g2(x,y) := f(y) are also continuous. 


Exercise 13.2.5. Let n,m > 0 be integers. Suppose that for every 0 < 
i < n and 0 < j < m we have a real number c,;. Form the function 
P:R? > R defined by 


P(x,y) := ` ` cya'y’. 


i=0 j=0 


(Such a function is known as a polynomial of two variables; a typical 
example of such a polynomial is P(z,y) = x? + 2xy? — x? + 3y + 6.) 
Show that P is continuous. (Hint: use Exercise 13.2.4 and Corollary 
13.2.3.) Conclude that if f : X — R and g : X — R are continuous 
functions, then the function P(f,g) : X —> R defined by P(f,g)(x) := 
P(f (x), g(x)) is also continuous. 


Exercise 13.2.6. Let R” and R” be Euclidean spaces. If f : X —> R” 
and g : X — R” are continuous functions, show that f@g:X => R™*” 
is also continuous, where we have identified R™ x R” with R™*” in the 
obvious manner. Is the converse statement true? 


426 13. Continuous functions on metric spaces 


Exercise 13.2.7. Let k > 1, let I be a finite subset of NF, and let 
c: I — R be a function. Form the function P : R! > R defined by 


P(21,..-, 2k) = 5 clir... ik)... $, 


(i1, ik)EI 


(Such a function is known as a polynomial of k variables; a typical ex- 
ample of such a polynomial is P(z1, £2, £3) = 323x9x3 — T212 + z1 + 5.) 
Show that P is continuous. (Hint: use induction on k, Exercise 13.2.6, 
and either Exercise 13.2.5 or Lemma 13.2.2.) 


Exercise 13.2.8. Let (X,dx) and (Y,dy) be metric spaces. Define the 
metric dxxy : (X x Y) x (X x Y) — [0, 00) by the formula 


dxxy((z,y), (2, y’)) := dx(z,2') + dy (y, y’). 


Show that (X x Y,dxxy) is a metric space, and deduce an analogue of 
Proposition 12.1.18 and Lemma 13.2.1. 


Exercise 13.2.9. Let f : R? — X bea function from R? to a metric 
space X. Let (£o, yo) be a point in R?. If f is continuous at (zo, yo), 
show that 


lim limsup f(x,y) = jim meee f(x,y) = f (Zo, yo) 


L+LQ y—yo 


and 
lim liminf f(z, y) = jim lim inf f(z, y) = f (Zo, yo). 


TTo Y—-Yo —Yo T—> 


In particular, we have 


lim lim f(x,y) = lim lim f(z,y) 


TTo Y— Yo yYy—>yo TTo 


whenever the limits on both sides exist. (Note that the limits do not 
necessarily exist in general; consider for instance the function f : R? — 
R such that f(x,y) = ysin when zy # 0 and f(x,y) = 0 otherwise.) 
Discuss the comparison between this result and Example 1.2.7. 


Exercise 13.2.10. Let f : R? — R be a continuous function. Show 
that for each x € R, the function y + f(x,y) is continuous on R, and 
for each y € R, the function z + f(x,y) is continuous on R. Thus a 
function f(x,y) which is jointly continuous in (z, y) is also continuous 
in each variable zx, y separately. 


13.8. Continuity and compactness 427 


Exercise 13.2.11. Let f : R? — R be the function defined by f(z, y) := 
ota when (x,y) # (0,0), and f(x,y) = 0 otherwise. Show that for 
each fixed x E R, the function y ++ f(x,y) is continuous on R, and that 
for each fixed y E R, the function z + f(z, y) is continuous on R, but 
that the function f : R? — R is not continuous on R?. This shows that 
the converse to Exercise 13.2.10 fails; it is possible to be continuous in 
each variable separately without being jointly continuous. 


13.3 Continuity and compactness 


Continuous functions interact well with the concept of compact 
sets defined in Definition 12.5.1. 


Theorem 13.3.1 (Continuous maps preserve compactness). Let 
f :X — Y be a continuous map from one metric space (X,dx) to 
another (Y,dy). Let K C X be any compact subset of X. Then 
the image f(K) := {f(x): a2 € K} of K is also compact. 


Proof. See Exercise 13.3.1. o 


This theorem has an important consequence. Recall from De- 
finition 9.6.5 the notion of a function f : X — R attaining a 
maximum or minimum at a point. We may generalize Proposi- 
tion 9.6.7 as follows: 


Proposition 13.3.2 (Maximum principle). Let (X,d) be a com- 
pact metric space, and let f : X — R be a continuous function. 
Then f is bounded. Furthermore, f attains its mazimum at some 
point Imax E X, and also attains its minimum at some point 
Tmin E X. 


Proof. See Exercise 13.3.2. O 


Remark 13.3.3. As was already noted in Exercise 9.6.1, this 
principle can fail if X is not compact. This proposition should be 
compared with Lemma 9.6.3 and Proposition 9.6.7. 


Another advantage of continuous functions on compact sets 
is that they are uniformly continuous. We generalize Definition 
9.9.2 as follows: 


428 13. Continuous functions on metric spaces 


Definition 13.3.4 (Uniform continuity). Let f : X — Y bea 
map from one metric space (X,dx) to another (Y,dy). We say 
that f is uniformly continuous if, for every € > 0, there exists a 
ô > 0 such that dy (f(x), f(x) < £ whenever x,2’ E€ X are such 
that dx(z,2’) < ô. 


Every uniformly continuous function is continuous, but not 
conversely (Exercise 13.3.3). But if the domain X is compact, 
then the two notions are equivalent: 


Theorem 13.3.5. Let (X,dx) and (Y,dy) be metric spaces, and 
suppose that (X,dx) is compact. If f: X — Y is function, then 
f is continuous if and only if it is uniformly continuous. 


Proof. If f is uniformly continuous then it is also continuous by 
Exercise 13.3.3. Now suppose that f is continuous. Fix € > 0. 
For every zo E€ X, the function f is continuous at 29. Thus there 
exists a (xo) > 0, depending on zo, such that dy(f(2), f(ao)) < 
e/2 whenever dx(x,2o0) < (xo). In particular, by the triangle 
inequality this implies that dy(f(x), f(z’)) < £ whenever x € 
B; x,ax)(£0, ô(£0)/2) and dx(z2’, x) < ô(x0)/2 (why?). 
Now consider the (possibly infinite) collection of balls 


{Bix ax) (£0, 5(%0)/2) : £o E X}. 


Each ball in this collection is of course open, and the union of 
all these balls covers X, since each point zo in X is contained 
in its own ball Brx,a,)(£0,6(20)/2). Hence, by Theorem 12.5.8, 
there exist a finite number of points £1,..., £n such that the balls 
Bcx,dx)(2j,6(25)/2) for j =1,...,n cover X: 


X CU Bix,ax) (xj, 6(2;)/2). 
j=l 


Now let ô := min?_, 6(z;)/2. Since each of the 6(z;) are positive, 
and there are only a finite number of j, we see that 6 > 0. Now 
let x, z’ be any two points in X such that dx (xz, 2’) < 6. Since the 
balls Byx.a,)(2j,6(2;)/2) cover X, we see that there must exist 


13.4. Continuity and connectedness 429 


1 < j < n such that x € Bry a,)(xj,6(2j)/2). Since dx (a, x’) < ô, 
we have dx(x,x') < 6(z;)/2, and so by the previous discussion 
we have dy(f(x), f(z’)) < e. We have thus found a 6 such that 
dy (f(z), f(x’)) < e whenever d(x, x’) < 6, and this proves uniform 
continuity as desired. O 


Erercise 13.3.1. Prove Theorem 13.3.1. 


Ezercise 13.3.2. Prove Proposition 13.3.2. (Hint: modify the proof of 
Proposition 9.6.7.) 

Exercise 13.3.3. Show that every uniformly continuous function is con- 
tinuous, but give an example that shows that not every continuous func- 
tion is uniformly continuous. 

Exercise 13.3.4. Let (X,dx), (Y,dy), (Z,dz) be metric spaces, and let 
f:X—Y andg:Y — Z be two uniformly continuous functions. Show 
that go f : X — Z is also uniformly continuous. 

Exercise 13.3.5. Let (X,dx) be a metric space, and let f : X — R and 
g : X — R be uniformly continuous functions. Show that the direct 
sum f ® g : X — R? defined by f © g(x) := (f(x), g(x)) is uniformly 
continuous. 

Exercise 13.3.6. Show that the addition function (z, y) + x +y and the 
subtraction function (x,y) ++ x — y are uniformly continuous from R? 
to R, but the multiplication function (x,y) + xy is not. Conclude that 
if f: X — R and g : X — R are uniformly continuous functions on 
a metric space (X,d), then f +g: X — R and f — g : X — R are 
also uniformly continuous. Give an example to show that fg: X —~R 
need not be uniformly continuous. What is the situation for max(f, g), 
min(f,g), f/g, and cf for a real number c? 


13.4 Continuity and connectedness 


We now describe another important concept in metric spaces, that 
of connectedness. 


Definition 13.4.1 (Connected spaces). Let (X,d) be a metric 
space. We say that X is disconnected iff there exist disjoint non- 
empty open sets V and W in X such that VUW = X. (Equiv- 
alently, X is disconnected if and only if X contains a non-empty 


430 13. Continuous functions on metric spaces 


proper subset which is simultaneously closed and open.) We say 
that X is connected iff it is non-empty and not disconnected. 


We declare the empty set @ as being special - it is neither 
connected nor disconnected; one could think of the empty set as 
“unconnected”. 


Example 13.4.2. Consider the set X := [1,2] U [3,4], with the 
usual metric. This set is disconnected because the sets [1,2] and 
[3, 4] are open relative to X (why?). 


Intuitively, a disconnected set is one which can be separated 
into two disjoint open sets; a connected set is one which cannot be 
separated in this manner. We defined what it means for a metric 
space to be connected; we can also define what it means for a set 
to be connected. 


Definition 13.4.3 (Connected sets). Let (X, d) be a metric space, 
and let Y be a subset of X. We say that Y is connected iff the 
metric space (Y,dly xy) is connected, and we say that Y is dis- 
connected iff the metric space (Y, d|y xy) is disconnected. 


Remark 13.4.4. This definition is intrinsic; whether a set Y is 
connected or not depends only on what the metric is doing on Y, 
but not on what ambient space X one placing Y in. 


On the real line, connected sets are easy to describe. 


Theorem 13.4.5. Let X be a subset of the real line R. Then the 
following statements are equivalent. 


(a) X is connected. 


(b) Whenever x,y E X and x < y, the interval |z,y] is also 
contained in X. 


(c) X is an interval (in the sense of Definition 9.1.1). 


Proof. First we show that (a) implies (b). Suppose that X is 
connected, and suppose for sake of contradiction that we could 


19.4. Continuity and connectedness 431 


find points x < y in X such that |z,y] is not contained in X. 
Then there exists a real number xz < z < y such that z g X. Thus 
the sets (—00, z)N X and (z,00)N X will cover X. But these sets 
are non-empty (because they contain x and y respectively) and 
are open relative to X, and so X is disconnected, a contradiction. 

Now we show that (b) implies (a). Let X be a set obeying 
the property (b). Suppose for sake of contradiction that X is 
disconnected. Then there exist disjoint non-empty sets V, W 
which are open relative to X, such that VUW = X. Since V and 
W are non-empty, we may choose an x € V and y € W. Since 
V and W are disjoint, we have x Æ y; without loss of generality 
we may assume x < y. By property (b), we know that the entire 
interval [x,y] is contained in X. 

Now consider the set [x,y] MV. This set is both bounded and 
non-empty (because it contains x). Thus it has a supremum 


z := sup([z, y] NV). 


Clearly z € [x,y], and hence z € X. Thus either z € V or 
z'e W. Suppose first that z € V. Then z # y (since y € 
W and V is disjoint from W). But V is open relative to X, 
which contains [x,y], so there is some ball Byzy),a)(z,7) which 
is contained in V. But this contradicts the fact that z is the 
supremum of [x,y] NV. Now suppose that z € W. Then z # x 
(since x € V and V is disjoint from W}. But W is open relative 
to X, which contains [x,y], so there is some ball Biz yj a)(z,7) 
which is contained in W. But this again contradicts the fact that 
z is the supremum of [z,y] N V. Thus in either case we obtain a 
contradiction, which means that X cannot be disconnected, and 
must therefore be connected. 

It remains to show that (b) and (c) are ent we leave 
this to Exercise 13.4.3. O 


Continuous functions map connected sets to connected sets: 


Theorem 13.4.6 (Continuity preserves connectedness). Let f : 
X — Y be a continuous map from one metric space (X,dx) to 


432 13. Continuous functions on metric spaces 


another (Y,dy). Let E be any connected subset of X. Then f(E) 
is also connected. 


Proof. See Exercise 13.4.4. o 


An important corollary of this result is the intermediate value 
theorem, generalizing Theorem 9.7.1. 


Corollary 13.4.7 (Intermediate value theorem). Let f : X >R 
be a continuous map from one metric space (X, dx) to the real 
line. Let E be any connected subset of X, and let a,b be any two 
elements of E. Let y be a real number between f(a) and f(b), i.e., 
either f(a) < y < f(b) or f(a) > y > f(b). Then there exists 
cE E such that f(c) = y. 


Proof. See Exercise 13.4.5. go 


Exercise 13.4.1. Let (X, daisc) be a metric space with the discrete metric. 
Let E be a subset of X which contains at least two elements. Show that 
E is disconnected. 


Exercise 13.4.2. Let f : X — Y bea function from a connected metric 
space (X,d) to a metric space (Y, ddisc) with the discrete metric. Show 
that f is continuous if and only if it is constant. (Hint: use Exercise 
13.4.1.) 


Exercise 13.4.3. Prove the equivalence of statements (b) and (c) in The- 
orem 13.4.5. 


Exercise 13.4.4. Prove Theorem 13.4.6. (Hint: the formulation of con- 
tinuity in Theorem 13.1.5(c) is the most convenient to use.) 


Exercise 13.4.5. Use Theorem 13.4.6 to prove Corollary 13.4.7. 


Exercise 13.4.6. Let (X,d) be a metric space, and let (Ea)acr be a 
collection of connected sets in X. Suppose also that (),<; Ha is non- 
empty. Show that Uger Ea is connected. 


Exercise 13.4.7. Let (X,d) be a metric space, and let E be a subset of 
X. We say that E is path-connected iff, for every x,y € E, there exists 
a continuous function y : [0,1] — E from the unit interval [0,1] to E 
such that (0) = z and 7(1) = y. Show that every path-connected set 
is connected. (The converse is false, but is a bit tricky to show and will 
not be detailed here.) 


13.5. Topological spaces (Optional) 433 


Exercise 13.4.8. Let (X, d) be a metric space, and let E be a subset of X. 
Show that if E is connected, then the closure E of E is also connected. 
Is the converse true? 

Exercise 13.4.9. Let (X,d) be a metric space. Let us define a relation 
r~ yon X by declaring x ~ y iff there exists a connected subset of X 
which contains both z and y. Show that this is an equivalence relation 
(i.e., it obeys the reflexive, symmetric, and transitive axioms). Also, 
show that the equivalence classes of this relation (i.e., the sets of the 
form {y € X : y ~ x} for some x € X) are all closed and connected. 
(Hint: use Exercise 13.4.6 and Exercise 13.4.8.) These sets are known 
as the connected components of X. 

Exercise 13.4.10. Combine Proposition 13.3.2 and Corollary 13.4.7 to 
deduce a theorem for continuous functions on a compact connected do- 
main which generalizes Corollary 9.7.4. 


13.5 Topological spaces (Optional) 


The concept of a metric space can be generalized to that of a 
topological space. The idea here is not to view the metric d as the 
fundamental object; indeed, in a general topological space there 
is no metric at all. Instead, it is the collection of open sets which 
is the fundamental concept. Thus, whereas in a metric space one 
introduces the metric d first, and then uses the metric to define 
first the concept of an open ball and then the concept of an open 
set, in a topological space one starts just with the notion of an 
open set. As it turns out, starting from the open sets, one cannot 
necessarily reconstruct a usable notion of a ball or metric (thus 
not all topological spaces will be metric spaces), but remarkably 
one can still define many of the concepts in the preceding sections. 

We will not use topological spaces at all in this text, and so 
we shall be rather brief in our treatment of them here. A more 
complete study of these spaces can of course be found in any 
topology textbook, or a more advanced analysis text. 


Definition 13.5.1 (Topological spaces). A topological space is a 
pair (X, F), where X is a set, and F C 2% is a collection of subsets 
of X, whose elements are referred to as open sets. Furthermore, 
the collection F must obey the following properties: 


434 13. Continuous functions on metric spaces 


e The empty set Ø and the whole set X are open; in other 
words, @ € F and X E F. 


e Any finite intersection of open sets is open. In other words, 
if V,,...,Vn are elements of F, then VN... N Vn is also in 
F 


e Any arbitrary union of open sets is open (including infinite 
unions). In other words, if (Væ)aez is a family of sets in F, 
then J,<7 Va is also in F. 


In many cases, the collection F of open sets can be deduced from 
context, and we shall refer to the topological space (X, F) simply 
as X. 


From Proposition 12.2.15 we see that every metric space (X, d) 
is automatically also a topological space (if we set F equal to the 
collection of sets which are open in (X,d)). However, there do 
exist topological spaces which do not arise from metric spaces 
(see Exercise 13.5.1, 13.5.6). 

We now develop the analogues of various notions in this chap- 
ter and the previous chapter for topological spaces. The notion of 
a ball must be replaced by the notion of a neighbourhood. 


Definition 13.5.2 (Neighbourhoods). Let (X, F) be a topologi- 
cal space, and let x € X. A neighbourhood of x is defined to be 
any open set in F which contains z. 


Example 13.5.3. If (X,d) is a metric space, x € X, and r > 0, 
then B(x,r) is a neighbourhood of z. 


Definition 13.5.4 (Topological convergence). Let m be an inte- 
ger, (X, F) be a topological space and let (a‘™))°°_, be a sequence 
of points in X. Let x be a point in X. We say that (z), 
converges to x if and only if, for every neighbourhood V of z, there 
exists an N > m such that 2 e V for alln > N. 


This notion is consistent with that of convergence in metric 
spaces (Exercise 13.5.2). One can then ask whether one has the 


13.5. Topological spaces (Optional) 435 


basic property of uniqueness of limits (Proposition 12.1.20). The 
answer turns out to usually be yes - if the topological space has 
an additional property known as the Hausdorff property - but the 
answer can be no for other topologies; see Exercise 13.5.4. 


Definition 13.5.5 (Interior, exterior, boundary). Let (X, F) be 
a topological space, let E be a subset of X, and let xo be a point 
in X. We say that zo is an interior point of E if there exists a 
neighbourhood V of zo such that V C E. We say that zo is an 
etterior point of E if there exists a neighbourhood V of zo such 
that VN E =. We say that zo is a boundary point of E if it is 
neither an interior point nor an exterior point of E. 


This definition is consistent with the corresponding notion for 
metric spaces (Exercise 13.5.3). 


Definition 13.5.6 (Closure). Let (X, F) be a metric space, let 
E be a subset of X, and let xp be a point in X. We say that zo 
is an adherent point of E if every neighbourhood V of xp has a 
non-empty intersection with E. The set of all adherent points of 
E is called the closure of E and is denoted E. 


There is a partial analogue of Theorem 12.2.10, see Exercise 
13.5.10. 

We define a set K in a topological space (X, F) to be closed 
iff its complement X\K is open; this is consistent with the metric 
space definition, thanks to Proposition 12.2.15(e). Some partial 
analogues of that Proposition are true (see Exercise 13.5.11). 

To define the notion of a relative topology, we cannot use De- 
finition 12.3.3 as this requires a metric function. However, we can 
instead use Proposition 12.3.4 as our starting point: 


Definition 13.5.7 (Relative topology). Let (X, F) be a topologi- 
cal space, and Y be a subset of X. Then we define Fy := {VNY : 
V € F}, and refer this as the topology on Y induced by (X, F). 
We call (Y, Fy) a topological subspace of (X, F). This is indeed a 
topological space, see Exercise 13.5.12. 


436 13. Continuous functions on metric spaces 


From Proposition 12.3.4 we see that this notion is compatible 
with the one for metric spaces. 
Next we define the notion of continuity. 


Definition 13.5.8 (Continuous functions). Let (X, Fx) and (Y, Fy 
be topological spaces, and let f : X — Y bea function. If rp E€ X, 
we say that f is continuous at Zo iff for every neighbourhood V of 
f (xo), there exists a neighbourhood U of zo such that f(U) CV. 
We say that f is continuous iff it is continuous at every point 
rE xX. 


This definition is consistent with that in Definition 13.1.1 (Ex- 
ercise 13.5.15). Partial analogues of Theorems 13.1.4 and 13.1.5 
are available (Exercise 13.5.16). In particular, a function is con- 
tinuous iff the pre-images of every open set is open. 

There is unfortunately no notion of a Cauchy sequence, a com- 
plete space, or a bounded space, for topological spaces. However, 
there is certainly a notion of a compact space, as we can see by 
taking Theorem 12.5.8 as our starting point: 


Definition 13.5.9 (Compact topological spaces). Let (X,F) be 
a topological space. We say that this space is compact if every 
open cover of X has a finite subcover. If Y is a subset of X, we 
say that Y is compact if the topological space on Y induced by 
(X, F) is compact. 


Many basic facts about compact metric spaces continue to hold 
true for compact topological spaces, notably Theorem 13.3.1 and 
Proposition 13.3.2 (Exercise 13.5.17). However, there is no notion 
of uniform continuity, and so there is no analogue of Theorem 
13.3.5. 

We can also define the notion of connectedness by repeating 
Definition 13.4.1 verbatim, and also repeating Definition 13.4.3 
(but with Definition 13.5.7 instead of Definition 12.3.3). Many 
of the results and exercises in Section 13.4 continue to hold for 
topological spaces (with almost no changes to any of the proofs!). 


13.5. Topological spaces (Optional) 437 


Erercise 13.5.1. Let X be an arbitrary set, and let F := {0, X}. Show 
that (X, F) is a topology (called the trivial topology on X). If X con- 
tains more than one element, show that the trivial topology cannot be 
obtained from by placing a metric d on X. Show that this topological 
space is both compact and connected. 


Erercise 13.5.2. Let (X,d) be a metric space (and hence a topologi- 
cal space). Show that the two notions of convergence of sequences in 
Definition 12.1.14 and Definition 13.5.4 coincide. 


Exercise 13.5.3. Let (X,d) be a metric space (and hence a topological 
space). Show that the two notions of interior, exterior, and boundary in 
Definition 12.2.5 and Definition 13.5.5 coincide. 


Erercise 13.5.4. A topological space (X, F) is said to be Hausdorff if 
given any two distinct points x,y E X, there exists a neighbourhood 
V of z and a neighbourhood W of y such that V NW = Ø. Show 
that any topological space coming from a metric space is Hausdorff, and 
show that the trivial topology is not Hausdorff. Show that the analogue 
of Proposition 12.1.20 holds for Hausdorff topological spaces, but give 
an example of a non-Hausdorff topological space in which Proposition 
12.1.20 fails. (In practice, most topological spaces one works with are 
Hausdorff; non-Hausdorff topological spaces tend to be so pathological 
that it is not very profitable to work with them.) 


Frercise 13.5.5. Given any totally ordered set X with order relation <, 
declare a set V C X to be open if for every x € V there exist a,b E€ X 
such that the “interval” {y E€ X : a < y < b} contains z and is contained 
in V. Let F be the set of all open subsets of X. Show that (X, F) is 
a topology (this is the order topology on the totally ordered set (X, <)) 
which is Hausdorff in the sense of Exercise 13.5.4. Show that on the real 
line R (with the standard ordering <), the order topology matches the 
standard topology (i.e., the topology arising from the standard metric). 
If instead one applies this to the extended real line R*, show that R 
is an open set with boundary {—o0, +00}. If (,)°2, is a sequence of 
numbers in R (and hence in R*), show that £n converges to +00 if and 
only if liminf,...%n = +00, and £n converges to —oo if and only if 
lim sup,,_,o9 Zn = —OO. 


Exercise 13.5.6. Let X be an uncountable set, and let F be the collection 
of all subsets E in X which are either empty or co-finite (which means 
that X\FE is finite). Show that (X, F) is a topology (this is called the 
cofinite topology on X) which is Hausdorff in the sense of Exercise 13.5.4, 
and is compact and connected. Also, show that if x € X (V,,)°C, is any 


438 13. Continuous functions on metric spaces 


countable collection of open sets containing x, then (%2; Vn # {x}. Use 
this to show that the cofinite topology cannot be obtained by placing 
a metric d on X. (Hint: what is the set N7; B(x, 1/n) equal to in a 
metric space?) 


Exercise 13.5.7. Let X be an uncountable set, and let F be the collection 
of all subsets Æ in X which are either empty or co-countable (which 
means that X\F is at most countable). Show that (X, F) is a topology 
(this is called the cocountable topology on X) which is Hausdorff in the 
sense of Exercise 13.5.4, and connected, but cannot arise from a metric 
space and is not compact. 


Exercise 13.5.8. Let X be an uncountable set, and let oo be an element of 
X. Let F be the collection of all subsets E in X which are either empty, 
or are co-countable and contain oo. Show that (X,F) is a compact 
topological space; however show that not every sequence in X has a 
convergent subsequence. 


Exercise 13.5.9. Let (X, F) be a compact topological space. Show that 
every sequence in X has a convergent subsequence, by modifying Exer- 
cise 12.5.11. Explain why this does not contradict Exercise 13.5.8. 


Ezercise 13.5.10. Prove the following partial analogue of Proposition 
12.2.10 for topological spaces: (c) implies both (a) and (b), which are 
equivalent to each other. Show that in the co-countable topology in 
Exercise 13.5.7, it is possible for (a) and (b) to hold without (c) holding, 


Exercise 13.5.11. Let E be a subset of a topological space (X, F). Show 
that E is open if and only if every element of E is an interior point, 
and show that E is closed if and only if E contains all of its adherent 
points. Prove analogues of Proposition 12.2.15(e)-(h) (some of these are 
automatic by definition). If we assume in addition that X is Hausdorff, 
prove an analogue of Proposition 12.2.15(d) also, but give an example 
to show that (d) can fail when X is not Hausdorff. 


Exercise 13.5.12. Show that the pair (Y, Fy ) defined in Definition 13.5.7 
is indeed a topological space. 

Exercise 13.5.13. Generalize Corollary 12.5.9 to compact sets in a topo- 
logical space. 

Exercise 13.5.14. Generalize Theorem 12.5.10 to compact sets in a topo- 
logical space. 


Exercise 13.5.15. Let (X,dx) and (Y,dy) be metric spaces (and hence 
a topological space). Show that the two notions continuity (both at a 
point, and on the whole domain) of a function f : X — Y in Definition 
13.1.1 and Definition 13.5.8 coincide. 


13.5. Topological spaces (Optional) 439 


Erercise 13.5.16. Show that when Theorem 13.1.4 is extended to topo- 
logical spaces, that (a) implies (b). (The converse is false, but construct- 
ing an example is difficult.) Show that when Theorem 13.1.5 is extended 
to topological spaces, that (a), (c), (d) are all equivalent to each other, 
and imply (b). (Again, the converse implications are false, but difficult 


to prove.) 


Exercise 13.5.17. Generalize both Theorem 13.3.1 and Proposition 13.3.2 
to compact sets in a topological space. 


Chapter 14 


Uniform convergence 


In the previous two chapters we have seen what it means for a 
sequence (x‘"))°°_, of points in a metric space (X, dx) to converge 
to a limit xz; it means that lim, dx (x), x) = 0, or equiv- 
alently that for every € > 0 there exists an N > 0 such that 
dx(x™, x) < £ for all n > N. (We have also generalized the 
notion of convergence to topological spaces (X, F), but in this 
chapter we will focus on metric spaces.) 


In this chapter, we consider what it means for a sequence 
of functions (f‘"))°, from one metric space (X,dx) to another 
(Y, dy) to converge. In other words, we have a sequence of func- 
tions f), f@..., with each function f™) : X — Y being a 
function from X to Y, and we ask what it means for this sequence 
of functions to converge to some limiting function f. 


It turns out that there are several different concepts of con- 
vergence of functions; here we describe the two most important 
ones, pointwise convergence and uniform convergence. (There are 
other types of convergence for functions, such as L convergence, 
L? convergence, convergence in measure, almost everywhere con- 
vergence, and so forth, but these are beyond the scope of this 
text.) The two notions are related, but not identical; the relation- 
ship between the two is somewhat analogous to the relationship 
between continuity and uniform continuity. 


Once we work out what convergence means for functions, and 
thus can make sense of such statements as limpo f( = f, we 


14.1. Limiting values of functions 441 


will then ask how these limits interact with other concepts. For 
instance, we already have a notion of limiting values of functions: 
lim,—>2o;2eX f(z). Can we interchange limits, i.e. 


lim lim f™(rz)= lim _ lim f™(z)? 


N00 7 19; TEX L—£9;LEX N= 


As we shall see, the answer depends on what type of convergence 
we have for f‘"). We will also address similar questions involving 
interchanging limits and integrals, or limits and sums, or sums 
and integrals. 


14.1 Limiting values of functions 


Before we talk about limits of sequences of functions, we should 
first discuss a similar, but distinct, notion, that of limiting values 
of functions. We shall focus on the situation for metric spaces, but 
there are similar notions for topological spaces (Exercise 14.1.3). 


Definition 14.1.1 (Limiting value of a function). Let (X, dx) 
and (Y,dy) be metric spaces, let E be a subset of X, and let 
f:X — Y bea function. If x E€ X is an adherent point of E, 
and L € Y, we say that f(x) converges to L in Y as x converges 
to xo in E, or write limz_.z).cen f(x) = L, if for every € > 0 there 
exists a ô > 0 such that dy(f(z), L) < e for all x € E such that 
dx (x, To) < 0. 


Remark 14.1.2. Some authors exclude the case x = zo from the 
above definition, thus requiring 0 < dx(x, xo) < 6. In our current 
notation, this would correspond to removing Zp from E, thus one 
would consider lim;_,2);c¢E\{x9} f (£) instead of limzery;zen f (£). 
See Exercise 14.1.1 for a comparison of the two concepts. 


Comparing this with Definition 13.1.1, we see that f is con- 
tinuous at xo if and only if 


lim | f(x) = f (zo). 


LZ 2r9;2EX 


442 14. Uniform convergence 


Thus f is continuous on X iff we have 


lim f(x) = f(zo) for all rp E€ X. 


r—2x9;2EX 


Example 14.1.3. If f : R — R is the function f(x) = z? — 4, 
then 
lim f(z) = f(1)=1-4=-3 


since f is continuous. 


Remark 14.1.4. Often we shall omit the condition x € X, and 
abbreviate limz_,.,:rex f(z) as simply lim;,, f(x) when it is 
clear what space x will range in. 


One can rephrase Definition 14.1.1 in terms of sequences: 


Proposition 14.1.5. Let (X,dx) and (Y,dy) be metric spaces, 
let E be a subset of X, and let f : X — Y be a function. Let 
xo E X be an adherent point of E and LEY. Then the following 
four statements are logically equivalent: 


(a) limz2o;ceE f(z) =L. 


(b) For every sequence (x'"))°°, in E which converges to xo with 
respect to the metric dx, the sequence (f (£™ Y); converges 
to L with respect to the metric dy. 


(c) For every open set V CY which contains L, there exists an 
open set U C X containing zo such that f(UNE) CV. 


(d) If one defines the function g : E U {zo} —> Y by defining 
g(xo) := L, and g(x) := f(x) for x € E\{zxo}, then g is 
continuous at xo. 


Proof. See Exercise 14.1.2. oO 


Remark 14.1.6. Observe from Proposition 14.1.5(b) and Propo- 
sition 12.1.20 that a function f(x) can converge to at most one 
limit L as x converges to xp. In other words, if the limit 


lim _ f(z) 


r—29;reE kb 


exists at all, then it can only take at most one value. 


14.1. Limiting values of functions 443 


Remark 14.1.7. The requirement that xo be an adherent point 
of E is necessary for the concept of limiting value to be useful, 
otherwise zo will lie in the exterior of E, the notion that f(z) 
converges to L as x converges to xo in E is vacuous (for ô suffi- 
ciently small, there are no points x € E so that d(x, £o) < ô). 


Remark 14.1.8. Strictly speaking, we should write 


dy— lim x) instead of lim x), 
X 2—29;2EB F ) eN ) 

since the convergence depends on the metric dy. However in prac- 

tice it will be obvious what the metric dy is and so we will omit 

the dy— prefix from the notation. 


Exercise 14.1.1. Let (X,dx) and (Y,dy) be metric spaces, let E be a 
subset of X, let f : E — Y be a function, and let zo be an element of 
E. Show that the limit limz—zo;zeg f(x) exists if and only if the limit 
lim; _.29;r¢E\ {zo} f(z) exists and is equal to f(2o). Also, show that if 
the limit limz_.z,.cee f(z) exists at all, then it must equal f(z9). 


Exercise 14.1.2. Prove Proposition 14.1.5. (Hint: review your proof of 
Theorem 13.1.4.) 


Exercise 14.1.3. Use Proposition 14.1.5(c) to define a notion of a limiting 
value of a function f : X — Y from one topological space (X, Fx) to an- 
other (Y, Fy). Then prove the equivalence of Proposition 14.1.5(c) and 
14.1.5(d). If in addition Y is a Hausdorff topological space (see Exercise 
13.5.4), prove an analogue of Remark 14.1.6. Is the same statement true 
if Y is not Hausdorff? 


Exercise 14.1.4. Recall from Exercise 13.5.5 that the extended real line 
R* comes with a standard topology (the order topology). We view the 
natural numbers N as a subspace of this topological space, and +oo as an 
adherent point of N in R*. Let (a,)°29 be a sequence taking values in a 
topological space (Y, Fy), and let L € Y. Show that lim, _,}ooneN an = 
L (in the sense of Exercise 14.1.3) if and only if lim,_... an = L (in the 
sense of Definition 13.5.4). This shows that the notions of limiting values 
of a sequence, and limiting values of a function, are compatible. 
Exercise 14.1.5. Let (X,dx), (Y,dy), (Z,dz) be metric spaces, and 
let zo E X, yo E Y, zo E Z. Let f: X — Y andg: Y — Z be 
functions, and let E be a set. If we have limz.z,.cez f(x) = yo and 
liMmy—yo;ye f(E) I(T) = Zo, conclude that limzezo;zeE g © f (£0) = zo- 


444 14. Uniform convergence 


Exercise 14.1.6. State and prove an analogue of the limit laws in Propo- 
sition 9.3.14 when X is now a metric space rather than a subset of R. 
(Hint: use Corollary 13.2.3.) 


14.2 Pointwise and uniform convergence 


The most obvious notion of convergence of functions is pointwise 
convergence, or convergence at each point of the domain: 


Definition 14.2.1 (Pointwise convergence). Let (f‘)%, be a 
sequence of functions from one metric space (X,dx) to another 
(Y,dy), and let f : X — Y be another function. We say that 
(f (n) po, converges pointwise to f on X if we have 
i (n) (4) — 
jim f™ (x)= f(z) 

for all x € X, i.e. 

lim dy (f™ (zx), f(z)) = 0. 

n— co 


Or in other words, for every x and every £ > 0 there exists N > 0 
such that dy (f™ (x), f(z)) < £ for every n > N. We call the 
function f the pointwise limit of the functions f™. 


Remark 14.2.2. Note that f(x) and f(x) are points in Y, 
rather than functions, so we are using our prior notion of con- 
vergence in metric spaces to determine convergence of functions. 
Also note that we are not really using the fact that (X,dx) is a 
metric space (i.e., we are not using the metric dx); for this de- 
finition it would suffice for X to just be a plain old set with no 
metric structure. However, later on we shall want to restrict our 
attention to continuous functions from X to Y, and in order to 
do so we need a metric on X (and on Y), or at least a topological 
structure. Also when we introduce the concept of uniform con- 
vergence, then we will definitely need a metric structure on X and 
Y; there is no comparable notion for topological spaces. 


14.2. Pointwise and uniform convergence 445 


Example 14.2.3. Consider the functions f(™ : R — R defined 
by f™(z) := z/n, while f : R — R is the zero function f(x) := 
0. Then f (n) converges pointwise to f, since for each fixed real 
number x we have limp—>oo f(z) = limp oo z/n = 0 = f(z). 


From Proposition 12.1.20 we see that a sequence (f mos of 
functions from one metric space (X, dx) to another (Y,dy) can 
have at most one pointwise limit f (this explains why we can refer 
to f as the pointwise limit). However, it is of course possible for 
a sequence of functions to have no pointwise limit (can you think 
of an example?), just as a sequence of points in a metric space do 
not necessarily have a limit. 

Pointwise convergence is a very natural concept, but it has a 
number of disadvantages: it does not preserve continuity, deriva- 
tives, limits, or integrals, as the following three examples show. 


Example 14.2.4. Consider the functions f™) : [0, 1] + R defined 
by f(x) := z”, and let f : [0,1] — R be the function defined 
by setting f(x) := 1 when x = 1 and f(x) :=0 when 0 < gz <1. 
Then the functions f”) are continuous, and converge pointwise to 
f on [0,1] (why? treat the cases z = 1 and 0 < x < 1 separately), 
however the limiting function f is not continuous. Note that the 
same example shows that pointwise convergence does not preserve 
differentiability either. 


Example 14.2.5. If limz_.z)-2ez f(x) = L for every n, and 
f™ converges pointwise to f, we cannot always take limits con- 
clude that limg_.25:ren f(z) = L. The previous example is also a 
counterexample here: observe that lim,_,1,,¢[0,1) ©" = 1 for every 
n, but x” converges pointwise to the function f defined in the 
previous paragraph, and lim,_,1.:¢[0,1) f(z) = 0. In particular, we 
see that 


lim lim f™(2)4 lim _ lim f(z). 


N—00 T—>T9;LEX 2-29 ;LEX N—-00 


(cf. Example 1.2.8). Thus pointwise convergence does not pre- 
serve limits. 


446 14. Uniform convergence 


Example 14.2.6. Suppose that f™) : [a,b] + R a sequence of 
Riemann-integrable functions on the interval [a,b]. If Jia m= 


L for every n, and f) converges pointwise to some new function 
f, this does not mean that Ji aj f = L. An example comes by 
setting [a,b] := [0,1], and letting f be the function f(z) := 
2n when z € [1/2n,1/n], and f(z) := 0 for all other values of 
z. Then f™) converges pointwise to the zero function f(x) := 0 
(why?). On the other hand, fio, f™ = 1 for every n, while 
Jos f =0. In particular, we have an example where 


im [| fs f lim f0. 
OTRS [a,b] [a,b] (EER 


One may think that this counterexample has something to do 
with the f) being discontinuous, but one can easily modify this 
counterexample to make the f™ continuous (can you see how’). 
Another example in the same spirit is the “moving bump” ex- 
ample. Let f(") : R — R be the function defined by f™ (x) := 1 
if x € [n,n + 1] and f™ (x) := 0 otherwise. Then fp f™ = 1 
for every n (where fR f is defined as the limit of Ji- n,n f as N 
goes to infinity). On the other hand, f (n) converges pointwise to 
the zero function 0 (why?), and fR 0 = 0. In both of these exam- 
ples, functions of area 1 have somehow “disappeared” to produce 
functions of area 0 in the limit. See also Example 1.2.9. 


These examples show that pointwise convergence is too weak a 
concept to be of much use. The problem is that while f™) (x) con- 
verges to f(x) for each x, the rate of that convergence varies sub- 
stantially with x. For instance, consider the first example where 
f™ : [0,1] — R was the function f™ (x) := z”, and f : [0,1] > R 
was the function such that f(z) := 1 when x = 1, and f(z) := 0 
otherwise. Then for each x, f™) (x) converges to f(x) as n — oo; 
this is the same as saying that limn.. 2” = 0 when 0 < z < 1, 
and that lim;,..9 x” = 1 when x = 1. But the convergence is 
much slower near 1 than far away from 1. For instance, consider 
the statement that limp... £” = 0 for all 0 < x < 1. This means, 


14.2. Pointwise and uniform convergence 447 


for every 0 < x < 1, that for every £, there exists an N > 1 such 
that |z”| < £ for all n > N - or in other words, the sequence 
1,2, z’,2°,... will eventually get less than €, after passing some 
finite number N of elements in this sequence. But the number 
of elements N one needs to go out to depends very much on the 
location of x. For instance, take € := 0.1. If z = 0.1, then we 
have |x| < £ for all n > 2 - the sequence gets underneath € after 
the second element. But if x = 0.5, then we only get |z”| < e for 
n > 4 - you have to wait until the fourth element to get within 
c of the limit. And if z = 0.9, then one only has |z”| < € when 
n > 22. Clearly, the closer x gets to 1, the longer one has to 
wait until f™ (x) will get within £ of f(x), although it still will 
get there eventually. (Curiously, however, while the convergence 
gets worse and worse as x approaches 1, the convergence suddenly 
becomes perfect when g = 1.) 

To put things another way, the convergence of f) to f is not 
uniform in x - the N that one needs to get f™ (x) within £ of f 
depends on z as well as on £. This motivates a stronger notion of 
convergence. 


Definition 14.2.7 (Uniform convergence). Let (f‘"))%, be a 
sequence of functions from one metric space (X,dx) to another 
(Y,dy), and let f : X — Y be another function. We say that 
(fŒ); converges uniformly to f on X if for every € > 0 there 
exists N > 0 such that dy (f™ (x), f(x)) < € for every n > N and 
£ : X. We call the function f the uniform limit of the functions 
pe, 


Remark 14.2.8. Note that this definition is subtly different from 
the definition for pointwise convergence in Definition 14.2.1. In 
the definition of pointwise convergence, N was allowed to depend 
on x; now it is not. The reader should compare this distinction 
to the distinction between continuity and uniform continuity (i.e., 
between Definition 13.1.1 and Definition 13.3.4). A more precise 
formulation of this analogy is given in Exercise 14.2.1. 


It is easy to see that if f converges uniformly to f on X, then 
it also converges pointwise to the same function f (see Exercise 


448 14. Uniform convergence 


14.2.2); thus when the uniform limit and pointwise limit both 
exist, then they have to be equal. However, the converse is not 
true; for instance the functions f™) : [0,1] — R defined earlier by 
f™ (x) := x” converge pointwise, but do not converge uniformly 
(see Exercise 14.2.2). 


Example 14.2.9. Let f™ : [0,1] — R be the functions f™ (x) := 
z/n, and let f : [0,1] — R be the zero function f(x) := 0. Then 
it is clear that f”) converges to f pointwise. Now we show that 
in fact f) converges to f uniformly. We have to show that for 
every € > 0, there exists an N such that | f(x) — f(£)| < € for 
every x € [0,1] and every n > N. To show this, let us fix an € > 0. 
Then for any x € [0,1] and n > N, we have 


\f™ (x) — f(x)| = |z/n -0| = z/n < 1/n < 1/N. 


Thus if we choose N such that N > 1/e (note that this choice of N 
does not depend on what z is), then we have |f™(x)— f(x)| <e 
for all n > N and z € (0, 1], as desired. 


We make one trivial remark here: if a sequence f™) : X => 
Y of functions converges pointwise (or uniformly) to a function 
f:X — Y, then the restrictions f (n)| p: E —Y of f™ to some 
subset Æ of X will also converge pointwise (or uniformly) to f ly. 
(Why?) 


Exercise 14.2.1. The purpose of this exercise is to demonstrate a con- 
crete relationship between continuity and pointwise convergence, and 
between uniform continuity and uniform convergence. Let f : R — R' 
be a function. For any a E R, let fa : R — R be the shifted function 


falz) := f(z — a). 
(a) Show that f is continuous if and only if, whenever (an)%o is 


a sequence of real numbers which converges to zero, the shifted 
functions fg, converge pointwise to f. 


(b) Show that f is uniformly continuous if and only if, whenever 
(an)o is a sequence of real numbers which converges to zero, 
the shifted functions fa, converge uniformly to f. 


14.8. Uniform convergence and continuity 449 


Exercise 14.2.2. (a) Let (f (n))co | be a sequence of functions from 
one metric space (X,dx) to another (Y,dy), and let f : X => Y 
be another function from X to Y. Show that if f(”) converges 
uniformly to f, then f”) also converges pointwise to f. 


(b) For each integer n > 1, let f™ : (—1,1) — R be the function 
f™ (x) := x”. Prove that f™ converges pointwise to the zero 
function 0, but does not converge uniformly to any function f : 
(-1,1) > R. 


(c) Let g : (—1,1) — R be the function g(x) := 2/(1— x). With the 

notation as in (b), show that the partial sums DF f(™ converges 
pointwise as N — oo to g, but does not converge uniformly to g, 
on the open interval (—1,1). (Hint: use Lemma 7.3.3.) What 
would happen if we replaced the open interval (—1,1) with the 


closed interval [—1, 1]? 

Exercise 14.2.3. Let (X,dx) a metric space, and for every integer n > 1, 
let fn : X — R be a real-valued function. Suppose that fn converges 
pointwise to another function f : X — R on X (in this question we give 
R the standard metric d(x,y) = |x&—y|). Let h : R — R be a continuous 
function. Show that the functions ho f, converge pointwise to ho f on 
X, where ho fn : X — R is the function ho f,(x) := h(fn(x)), and 
similarly for ho f. 

Exercise 14.2.4. Let fn : X — Y be a sequence of bounded functions 
from one metric space (X, dx) to another metric space (Y, dy ). Suppose 
that fn converges uniformly to another function f : X — Y. Suppose 
that f is a bounded function; i.e., there exists a ball Bry.ay)(yo, R) in Y 
such that f(x) € Bry.ay)(yo, R) for all x E€ X. Show that the sequence 
fn is uniformly bounded; i.e. there exists a ball Bry.a,)(yo, R) in Y such 
that fr(x) € Bry.ay)(yo, R) for all x € X and all positive integers n. 


14.3 Uniform convergence and continuity 


We now give the first demonstration that uniform convergence is 
significantly better than pointwise convergence. Specifically, we 
show that the uniform limit of continuous functions is continuous. 


Theorem 14.3.1 (Uniform limits preserve continuity I). Suppose 
(FŒ) is a sequence of functions from one metric space (X,dx) 
to another (Y,dy), and suppose that this sequence converges uni- 
formly to another function f : X — Y. Let xp be a point in X. 


450 14. Uniform convergence 


If the functions f™ are continuous at xo for each n, then the 
limiting function f is also continuous at zo. 


Proof. See Exercise 14.3.1. g 
This has an immediate corollary: 


Corollary 14.3.2 (Uniform limits preserve continuity II). Let 
(f (0) be a sequence of functions from one metric space (X, dx) 
to another (Y,dy), and suppose that this sequence converges uni- 
formly to another function f : X = Y. If the functions f™ are 
continuous on X for each n, then the limiting function f is also 
continuous on X. 


This should be contrasted with Example 14.2.4. There is a 
slight variant of Theorem 14.3.1 which is also useful: 


Proposition 14.3.3 (Interchange of limits and uniform limits). 
Let (X,dx) and (Y, dy) be metric spaces, with Y complete, and let 
E be a subset of X. Let ( FO); be a sequence of functions from 
E to Y, and suppose that this sequence converges uniformly in E 
to some function f : E > Y. Let zo E€ X be an adherent point 
of E, and suppose that for each n the limit limg_.2).cek f™ (zx) 
exists. Then the limit limg—.z).2ek f(x) also exists, and is equal to 
the limit of the sequence (limz_.25:reE f (n) (x))?°.,; in other words 
we have the interchange of limits 
lim lim f™(z)= lim _ lim f™(z). 


n—0O t—Tto;tEE z—z0o;tEE n—-0o 
Proof. See Exercise 14.3.2. O 


This should be constrasted with Example 14.2.5. Finally, we 
have a version of these theorems for sequences: 


Proposition 14.3.4. Let (f‘)2°, be a sequence of continuous 
functions from one metric space (X,dx) to another (Y,dy), and 
suppose that this sequence converges uniformly to another function 
f:X—Y. Let x™ be a sequence of points in X which converge 
to some limit x. Then f™ (£) converges (in Y ) to f(z). 


14.8. Uniform convergence and continuity 451 


Proof. See Exercise 14.3.4. o 
A similar result holds for bounded functions: 


Definition 14.3.5 (Bounded functions). A function f : X — Y 
from one metric space (X, dx) to another (Y,dy) is bounded if 
f(X) is a bounded set, i.e., there exists a ball Bry ay) (yo, R) in Y 
such that f(x) € Bry.a,)(yo, R) for all z € X. 


Proposition 14.3.6 (Uniform limits preserve boundedness). Let 

(n))°° | be a sequence of functions from one metric space (X, dx) 
to another (Y,dy), and suppose that this sequence converges uni- 
formly to another function f : X — Y. If the functions Fo 
are bounded on X for each n, then the limiting function f is also 
bounded on X. 


Proof. See Exercise 14.3.6. go 


Remark 14.3.7. The above propositions sound very reasonable, 
but one should caution that it only works if one assumes uniform 
convergence; pointwise convergence is not enough. (See Exercises 
14.3.3, 14.3.5, 14.3.7.) 


Exercise 14.3.1. Prove Theorem 14.3.1. Explain briefly why your proof 
requires uniform convergence, and why pointwise convergence would not 
suffice. (Hints: it is easiest to use the “epsilon-delta” definition of con- 
tinuity from Definition 13.1.1. You may find the triangle inequality 


dy (f(x), f(£0)) <dy (f(a), f™(2)) + dy (f(a), f (x0)) 
+ dy (f™ (x0), f(z0)) 


useful. Also, you may need to divide € ase = £€/3 + €/3 + €/3. Finally, 
it is possible to prove Theorem 14.3.1 from Proposition 14.3.3, but you 
may find it easier conceptually to prove Theorem 14.3.1 first.) 


Ezercise 14.3.2. Prove Proposition 14.3.3. (Hint: this is very similar to 
Theorem 14.3.1. Theorem 14.3.1 cannot be used to prove Proposition 
14.3.3, however it is possible to use Proposition 14.3.3 to prove Theorem 
14.3.1.) 


452 14. Uniform convergence 


Exercise 14.3.3. Compare Proposition 14.3.3 with Example 1.2.8. Cap 
you now explain why the interchange of limits in Example 1.2.8 led to a 
false statement, whereas the interchange of limits in Proposition 14.3.3 
is justified? 


Exercise 14.3.4. Prove Proposition 14.3.4. (Hint: again, this is similar 
to Theorem 14.3.1 and Proposition 14.3.3, although the statements are 
slightly different, and one cannot deduce this directly from the other two 
results. ) 


Exercise 14.3.5. Give an example to show that Proposition 14.3.4 fails if 
the phrase “converges uniformly” is replaced by “converges pointwise”, 
(Hint: some of the examples already given earlier will already work 
here.) 


Exercise 14.3.6. Prove Propositoin 14.3.6. 


Exercise 14.3.7. Give an example to show that Proposition 14.3.6 fails if 
the phrase “converges uniformly” is replaced by “converges pointwise”. 
(Hint: some of the examples already given earlier will already work 
here.) 


Exercise 14.3.8. Let (X,d) be a metric space, and for every positive 
integer n, let fn : X — R and gn : X — R be functions. Suppose that 
(fn) converges uniformly to another function f : X — R, and that 
(gn); converges uniformly to another function g : X — R. Suppose 
also that the functions (fn); and (gn)P2, are uniformly bounded, i.e., 
there exists an M > 0 such that |f,(xz)| < M and |gn(z)| < M for all 
n > land z E€ R. Prove that the functions f,g, : X — R converge 
uniformly to fg: X — R. 


14.4 The metric of uniform convergence 


We have now developed at least four, apparently separate, notions 
of limit in this text: 


(a) limits liMmp—oo 2‘ of sequences of points in a metric space 
(Definition 12.1.14; see also Definition 13.5.4); 


(b) limiting values lim,_.z):zee f(z) of functions at a point (De 
finition 14.1.1); 


(c) pointwise limits f of functions f™) (Definition 14.2.1; and 


14.4. The metric of uniform convergence 453 


(d) uniform limits f of functions f (n) (Definition 14.2.7). 


This proliferation of limits may seem rather complicated. How- 
ever, we can reduce the complexity slightly by observing that (d) 
can be viewed as a special case of (a), though in doing so it should 
be cautioned that because we are now dealing with functions in- 
stead of points, the convergence is not in X or in Y, but rather in 
a new space, the space of functions from X to Y. 


Remark 14.4.1. If one is willing to work in topological spaces 
instead of metric spaces, we can also view (b) as a special case 
of (a), see Exercise 14.1.4, and (c) is also a special case of (a), 
see Exercise 14.4.4. Thus the notion of convergence in a topolog- 
ical space can be used to unify all the notions of limits we have 
encountered so far. 


Definition 14.4.2 (Metric space of bounded functions). Suppose 
(X,dx) and (Y,dy) are metric spaces. We let B(X — Y) denote 
the space! of bounded functions from X to Y: 


B(X > Y):={flf:X —Y is a bounded function}. 


We define a metric dq : B(X — Y) x B(X — Y) > Rt by 
defining 


doo(f, 9) = aD dy (f(z), 9(x)) = sup{dy (f(z), g(z)): 2 E€ X} 


for all f,g E€ B(X — Y). This metric is sometimes known as the 
sup norm metric or the L? metric. We will also use dg(x—y) as 
a synonym for dg. 


Notice that the distance d.(f,g) is always finite because f 
and g are assumed to be bounded on X. 


Example 14.4.3. Let X := [0,1] and Y = R. ‘Let f : [0,1] > R 
and g : [0,1] — R be the functions f(x) := 2x and g(x) := 


‘Note that this is a set, thanks to the power set axiom (Axiom 3.10) and 
the axiom of specification (Axiom 3.5). 


454 14. Uniform convergence 


3x. Then f and g are both bounded functions and thus live in 
B({0,1] — R). The distance between them is 


doo(f,g) = sup |2z — 3z| = sup |z|=1. 


This space turns out to be a metric space (Exercise 14.4.1). 
Convergence in this metric turns out to be identical to uniform 
convergence: 


Proposition 14.4.4. Let (X,dx) and (Y,dy) be metric spaces. 
Let (f mjo be a sequence of functions in B(X — Y), and let f 
be another function in B(X — Y). Then (f\™)°, converges to f 
in the metric dgix—y) if and only if (f (ao converges uniformly 


to f. 
Proof. See Exercise 14.4.2. o 


Now let C(X — Y) be the space of bounded continuous func- 
tions from X to Y: 


C(X — Y) := {f € B(X — Y)|f is continuous}. 


This set C(X — Y) is clearly a subset of B(X — Y}. Corol- 
lary 14.3.2 asserts that this space C (X — Y) is closed in B(X 7 
Y) (why?). Actually, we can say a lot more: 


Theorem 14.4.5 (The space of continuous functions is complete). 
Let (X,dx) be a metric space, and let (Y,dy) be a complete met- 
ric space. The space (C(X — Y),dg(xy)lo(xy)xc(xy)) is 
a complete subspace of (B(X — Y),dg(x~y)). In other words, 
every Cauchy sequence of functions in C(X — Y) converges to a 
function in C(X > Y). 


Proof. See Exercise 14.4.3. 0 


Exercise 14.4.1. Let (X,dx) and (Y,dy) be metric spaces. Show that 
the space B(X — Y) defined in Definition 14.4.2, with the metric 
dp(x—y), is indeed a metric space. 


14.5. Series of functions; the Weierstrass M-test 455 


Exercise 14.4.2. Prove Proposition 14.4.4. 


Exercise 14.4.3. Prove Theorem 14.4.5. (Hint: this is similar, but not 
identical, to the proof of Theorem 14.3.1). 

Erercise 14.4.4. Let (X, dx) and (Y, dy) be metric spaces, and let Y* := 
{ fl f: X — Y} be the space of all functions from X to Y (cf. Axiom 


3.10). If to E X and V is an open set in Y, let V (to) C YX be the set 
Vo) :— {f eY% : f(zo) € V}. 


If E is a subset of Y*, we say that E is open if for every f € E, 
there exists a finite number of points 21,...,%, E X and open sets 
Vi,---,Vn C Y such that 


fen... AVEC E. 


e Show that if F is the collection of open sets in Y* , then y> ,F) 
is a topological space. 


e For each natural number n, let f”) : X — Y be a function from 
X to Y, and let f : X — Y be another function from X to Y. 
Show that f(™) converges to f in the topology F (in the sense of 
Definition 13.5.4) if and only if f™ converges to f pointwise (in 
the sense of Definition 14.2.1). 


The topology F is known as the topology of pointwise convergence, for 
obvious reasons; it is also known as the product topology. It shows that 
the concept of pointwise convergence can be viewed as a special case of 
the more general concept of convergence in a topological space. 


14.5 Series of functions; the Weierstrass M-test 


Having discussed sequences of functions, we now discuss infinite 
series $ >] fn of functions. Now we shall restrict our attention to 
functions f : X — R from a metric space (X,dx) to the real line 
R (which we of course give the standard metric); this is because we 
know how to add two real numbers, but don’t necessarily know 
how to add two points in a general metric space Y. Functions 
whose range is R are sometimes called real-valued functions.. 
Finite summation is, of course, easy: given any finite collection 
f,...,fO of functions from X to R, we can define the finite 


456 14. Uniform convergence 


sum >, f : X >R by 


N . N ° 
(S ONE) = 9 fO) 
i=1 i=1 


Example 14.5.1. If f) : R — R is the function fH (z) := g, 
fC) : R = R is the function f@)(z) := 2?, and f®): RoR 
is the function f)(x) := 23, then f := pee f® is the function 
f: R — R defined by f(z) := z + z? + 2°. 


It is easy to show that finite sums of bounded functions are 
bounded, and finite sums of continuous functions are continuous 
(Exercise 14.5.1). 

Now to add infinite series. 


Definition 14.5.2 (Infinite series). Let (X, dx) be a metric space. 
Let (f sa) os be a sequence of functions from X to R, and let f be 
another function from X to R. If the partial sums ye f™ con- 
verge pointwise to f on X as N — œ, we say that the infinite se- 
ries J2] f™ converg pointwise to f, and write f = ~~, f™. 
If the partial sums 5 5i (") converge uniformly to f on X as 
N — oo, we say that ae infinite _ S Dans f\™ converges uni- 
formly to f, and again write f = if FO) (Thus when one 
sees an expression such as f = he f tn) one Posen look at the 
context to see in what sense this infinite série converges. ) 


Remark 14.5.3. A series } p; f (n) converges pointwise to f on 
X if and only if ~~, f™ (x) converges to f(x) for every x € X. 
(Thus if or". f (n) does not converge pointwise to f, this does not 
mean that it diverges pointwise; it may just be that it converges 
for some points x but diverges at other points zx.) 


Ifa series }) p] f (n) converges uniformly to f, then it also con- 
verges pointwise to f; but not vice versa, as the following example 
shows. 


Example 14.5.4. Let f™) : (—1,1) — R be the sequence of 
functions f™) (x) := z”. Then 5O22 f™ converges pointwise, 
but not uniformly, to the function z/(1— x) (Exercise 14.5.2). 


14.5. Series of functions; the Weierstrass M-test 457 


It is not always clear when a series )~°°_, f‘") converges or not. 
However, there is a very useful test that gives at least one test for 
uniform convergence. 


Definition 14.5.5 (Sup norm). If f : X — R is a bounded 
real-valued function, we define the sup norm ||f ||oo of f to be the 
number 


Il flloo = sup{|f(x)|: £ € X}. 


In other words, ||f|loo = doo(f,0), where 0: X — R is the zero 
function 0(x) := 0, and doo was defined in Definition 14.4.2. (Why 
is this the case?) 


Example 14.5.6. Thus, for instance, if f : (—2,1) — R is the 
function f(x) := 2g, then ||flloo = sup{|2z| : £x E€ (—2,1)} = 4 
(why?). Notice that when f is bounded then ||f||.0 will always be 
a non-negative real number. 


Theorem 14.5.7 (Weierstrass M-test). Let (X,d) be a metric 
space, and let (f("))°°_, be a sequence of bounded real-valued con- 
tinuous functions on X such that the series ~~, ||f™ Il is con- 
vergent. (Note that this is a series of plain old real numbers, not 
of functions.) Then the series $7 f (") converges uniformly to 
some function f on X, and that function f is also continuous. 


Proof. See Exercise 14.5.3. O 


To put the Weierstrass M-test succinctly: absolute conver- 
gence of sup norms implies uniform convergence of functions. 


Example 14.5.8. Let 0 < r < 1 be a real number, and let 
f™ : [-r,r] — R be the series of functions f™ (x) := z”. Then 
each f”) is continuous and bounded, and || f‘ ||. = r” (why?). 
Since the series >>’, r” is absolutely convergent (e.g., by the ra- 
tio test, Theorem 7.5.1), we thus see that f (n) converges uniformly 
in [—r,r] to some continuous function; in Exercise 14.5.2 we see 
that this function must in fact be the function f : [—r,r] ~ R 
defined by f(x) := x/(1 — x). In other words, the series } p] £” 
is pointwise convergent, but not uniformly convergent, on (—1, 1), 


458 14. Uniform convergence 


but is uniformly convergent on the smaller interval |—r,r] for any 
O<r<l. 


The Weierstrass M-test is especially useful in relation to power 
series, which we will encounter in the next chapter. 


Exercise 14.5.1. Let f™,..., f%) be a finite sequence of bounded func- 
tions from a metric space (X,dx) to R. Show that Be f® is also 
bounded. Prove a similar claim when “bounded” is replaced by “contin- 
uous”. What if “continuous” was replaced by “uniformly continuous”? 


Exercise 14.5.2. Verify the claim in Example 14.5.4. 
Exercise 14.5.3. Prove Theorem 14.5.7. (Hint: first show that the se- 


quence me f® is a Cauchy sequence in C(X — R). Then use Theo- 
rem 14.4.5.) 


14.6 Uniform convergence and integration 


We now connect uniform convergence with Riemann integration 
(which was discussed in Chapter 11), by showing that uniform 
limits can be safely interchanged with integrals. 


Theorem 14.6.1. Let [a,b] be an interval, and for each integer 
n > 1, let f™ : [a,b] — R be a Riemann-integrable function. 
Suppose f™) converges uniformly on [a,b] to a function f : [a,b] — 
R. Then f is also Riemann integrable, and 


lim f™ = J f 
M09 J ja,b] [a,b] 


Proof. We first show that f is Riemann integrable on [a,b]. This is 
the same as showing that the upper and lower Riemann integrals 


of f match: f at f=] abf: 


Let € > 0. Since f) converges uniformly to f, we see that 
there exists an N > 0 such that |f™ (x) — f(x)| < e for all n > N 
and x € [a,b]. In particular we have 


f™(a)—e< f(x) < f(x) +e 


14.6. Uniform convergence and integration 459 


for all x € [a,b]. Integrating this on fa, b] we obtain 


f a UEP E J a < Ja f< E (F +6). 


Since f‘") is assumed to be Riemann integrable, we thus see 


("))_ (ha T (")) + (ba). 
(ff )-e-a</ fx K(f P 


“— [a,b] [a,b] 


In particular, we see that 
o< | f- f < 2e(b— a). 
[a,b] “_{a,b] 


Since this is true for every € > 0, we obtain J jab J= Sias fas 


desired. 
The above argument also shows that for every € > 0 there 
exists an N > 0 such that 


IJ f%— ffl <2€(b—a) 
[a,b] [a,b] 
for all n > N. Since e is arbitrary, we see that J ab f™ converges 
to Jia » f as desired. o 


To rephrase Theorem 14.6.1: we can rearrange limits and in- 
tegrals (on compact intervals [a, b]), 


[a,b] [a 


nm— Co ,b] n— Co 


provided that the convergence is uniform. This should be con- 
trasted with Example 14.2.5 and Example 1.2.9. 
There is an analogue of this theorem for series: 


Corollary 14.6.2. Let [a,b] be an interval, and let (f‘)2~, be 
a sequence of Riemann integrable functions on [a,b] such that the 
series X p f (") is uniformly convergent. Then we have 


Z I, fo) = | 5 fo), 


1b] n=l 


460 14. Uniform convergence 
Proof. See Exercise 14.6.1. O 


This Corollary works particularly well in conjunction with the 
Weierstrass M-test (Theorem 14.5.7): 


Example 14.6.3. (Informal) From Lemma 7.3.3 we have the geo- 
metric series identity 


CO 
"= 
n=1 


for x € (—1, 1), and the convergence is uniform (by the Weierstrass 
M-test) on [—r,r] for any 0 < r < 1. By adding 1 to both sides 


we obtain 
Soe" 


n=0 


and the converge is again uniform. We can thus integrate on (0, r] 
and use Corollary 14.6.2 to obtain 


3. J x” dz = J : dx. 
no [0,7] for) 1-2 


The left-hand side is °°), r"*1/(n + 1). If we accept for now 
the use of logarithms (we will justify this use in Section 15.5), the 
anti-derivative of 1/(1 — x) is —log(1 — x), and so the right-hand 
side is — log(1 — r). We thus obtain the formula 


—log(1—r) = Sort n+ 1) 


n=0 


for all O <r <1. 


Exercise 14.6.1. Use Theorem 14.6.1 to prove Corollary 14.6.2. 


14.7. Uniform convergence and derivatives 46] 


14.7 Uniform convergence and derivatives 


We have already seen how uniform convergence interacts well with 
continuity, with limits, and with integrals. Now we investigate 
how it interacts with derivatives. 

The first question we can ask is: if fp converges uniformly to 
f, and the functions fn are differentiable, does this imply that f 
is also differentiable? And does f/ also converge to f’? 

The answer to the second question is, unfortunately, no. To 
see a counterexample, we will use without proof some basic facts 
about trigonometric functions (which we will make rigourous in 
Section 15.7). Consider the functions fn : [0,27] — R defined 
by fn(z) := n1? sin(nz), and let f : [0,27] — R be the zero 
function f(x) := 0. Then, since sin takes values between -1 
and 1, we have do(fn,f) < n—1/2, where we use the uniform 
metric doo(f,g) := SUPzejo,2n] |f (£) — g(x)| introduced in Defin- 
intion 14.4.2. Since n~!/? converges to 0, we thus see by the 
squeeze test that fn converges uniformly to f. On the other hand, 
f(z) = n!/? cos(nz), and so in particular | f/,(0) — f’(0)| = n/. 
Thus f/ does not converge pointwise to f’, and so in particular 
does not converge uniformly either. In particular we have 


Z iim falz) # lim Æ fa(2). 


The answer to the first question is also no. An example is 
the sequence of functions fn : [—1,1] — R defined by fn(x) := 
4/ 4 +x?. These functions are differentiable (why?). Also, one 
can easily check that 


1 
el < falz) < Izl + — 


for all x € [—1,1] (why? square both sides), and so by the squeeze 
test fn converges uniformly to the absolute value function f(x) := 
|z|. But this function is not differentiable at 0 (why?). Thus, the 
uniform limit of differentiable functions need not be differentiable. 
(See also Example 1.2.10). 


462 14. Uniform convergence 


So, in summary, uniform convergence of the functions fn says 
nothing about the convergence of the derivatives ff. However, the 
converse is true, as long as fn converges at at least one point: 


Theorem 14.7.1. Let [a,b] be an interval, and for every integer 
n > 1, let fn : [a,b] > R be a differentiable function whose deriv- 
ative f, : [a,b] — R is continuous. Suppose that the derivatives 
fi, converge uniformly to a function g : [a,b] — R. Suppose also 
that there exists a point xo such that the limit limn—.oo fn(z£0) ex- 
ists. Then the functions fn converge uniformly to a differentiable 
function f, and the derivative of f equals g. 


Informally, the above theorem says that if ff converges uni- 
formly, and f,(xo) converges for some Zp, then fn also converges 
uniformly, and 4 Dmna Talt) = mye 4 falz): 


Proof. We only give the beginning of the proof here; the remainder 
of the proof will be an exercise (Exercise 14.7.1). 

Since f/, is continuous, we see from the fundamental theorem 
of calculus (Theorem 11.9.4) that 


fn = Fn = fa 
(x) — fa(z0) Jaa 
when x E [zo, b], and 
Fn = Fn = f, 
(x) — fa(2o) I. 


when z € [a, Zo]. 
Let L be the limit of fn(£o) as n — oo: 


L:= lim fn(z0). 
nm—+00 


By hypothesis, L exists. Now, since each f% is continuous on [a,b], 
and ff, converges uniformly to g, we see by Corollary 14.3.2 that 
g is also continuous. Now define the function f : [a,b] —> R by 


setting 
f(a) =L- | 9+ | g 
[a,zo] [a,x] 


14.7. Uniform convergence and derivatives 463 


for all x € [a,b]. To finish the proof, we have to show that fn con- 
verges uniformly to f, and that f is differentiable with derivative 
g; this shall be done in Exercise 14.7.1. o 


Remark 14.7.2. It turns out that Theorem 14.7.1 is still true 
when the functions ff are not assumed to be continuous, but the 
proof is more difficult; see Exercise 14.7.2. 


By combining this theorem with the Weierstrass M-test, we 
obtain 


Corollary 14.7.3. Let [a,b] be an interval, and for every inte- 
gern > 1, let fn : [a,b] — R be a differentiable function whose 
derivative f! : [a,b] — R is continuous. Suppose that the series 
Yaa llall is absolutely convergent, where 


Ifall = sup |f,(2)| 


rE€la, 


is the sup norm of fi, as defined in Definition 14.5.5. Suppose 
also that the series $>] fn(X0) is convergent for some xo € [a,b]. 
Then the series $ >; fn converges uniformly on [a,b] to a differ- 
entiable function, and in fact 


£ > fal) => fal) 
n=1 n=1 


for all x € [a,b]. 
Proof. See Exercise 14.7.3. O 


We now pause to give an example of a function which is con- 
tinuous everywhere, but differentiable nowhere (this particular ex- 
ample was discovered by Weierstrass). Again, we will presume 
knowledge of the trigonometric functions, which will be covered 
rigourously in Section 15.7. 


Example 14.7.4. Let f : R — R be the function 


fas 3 4~" cos(32”" 72). 
n=1 


464 14. Uniform convergence 


Note that this series is uniformly convergent, thanks to the Weier- 
strass M-test, and since each individual function 4~ cos(32"7z) 
is continuous, the function f is also continuous. However, it is 
not differentiable (Exercise 15.7.10); in fact it is a nowhere dif- 
ferentiable function, one which is not differentiable at any point, 
despite being continuous everywhere! 


Exercise 14.7.1. Complete the proof of Theorem 14.7.1. Compare this 
theorem with Example 1.2.10, and explain why this example does not 
contradict the theorem. 


Exercise 14.7.2. Prove Theorem 14.7.1 without assuming that f/ is con- 
tinuous. (This means that you cannot use the fundamental theorem of 
calculus. However, the mean value theorem (Corollary 10.2.9) is still 
available. Use this to show that if do(f’,f’,) < €, then |(fr(x) — 
fm(x)) — (fn(20) — fm(20))| < elz — zo| for all x € [a,b], and then use 
this to complete the proof of Theorem 14.7.1. 


14.8 Uniform approximation by polynomials 


As we have just seen, continuous functions can be very badly be- 
haved, for instance they can be nowhere differentiable (Example 
14.7.4). On the other hand, functions such as polynomials are al- 
ways very well behaved, in particular being always differentiable. 
Fortunately, while most continuous functions are not as well be- 
haved as polynomials, they can always be uniformly approximated 
by polynomials; this important (but difficult) result is known as 
the Weierstrass approximation theorem, and is the subject of this 
section. 


Definition 14.8.1. Let [a,b] be an interval. A polynomial on 
[a,b] is a function f : [a,b] — R. of the form f(x) := )°%_» C52), 
where n > Q is an integer and co,...,c, are real numbers. If 
Cn Æ 0, then n is called the degree of f. 


Example 14.8.2. The function f : [1,2] — R defined by f(x) := 
324 + 22° — 42 + 5 is a polynomial on [1,2] of degree 4. 


14.8. Uniform approximation by polynomials 465 


Theorem 14.8.3 (Weierstrass approximation theorem). If [a,b] 
ig an interval, f : [a,b] > R is a continuous function, and € > 0, 
then there exists a polynomial P on [a,b] such that doo(P, f) < € 
(i.e., |P(x) — f (x)| < € for alla € [a, b}). 

Another way of stating this theorem is as follows. Recall that 
C([a,b] —> R) was the space of continuous functions from [a, b] to. 
R, with the uniform metric dæ. Let P([a, b] — R) be the space of 
all polynomials on [a,b]; this is a subspace of C([a, b] — R), since 
all polynomials are continuous (Exercise 9.4.7). The Weierstrass 
approximation theorem then asserts that every continuous func- 
tion is an adherent point of P([a, b] — R); or in other words, that 
the closure of the space of polynomials is the space of continuous 
functions: 

P([a, b] + R) = C(([a, b] > R). 
In particular, every continuous function on [a,b] is the uniform 
limit of polynomials. Another way of saying this is that the space 
of polynomials is dense in the space of continuous functions, in 
the uniform topology. 

The proof of the Weierstrass approximation theorem is some- 
what complicated and will be done in stages. We first need the 
notion of an approximation to the identity. 


Definition 14.8.4 (Compactly supported functions). Let [a,b] be 
an interval. A function f : R — R is said to be supported on [a, b] 
iff f(x) = 0 for all x ¢ [a,b]. We say that f is compactly supported 
iff it is supported on some interval [a,b]. If f is continuous and 
supported on [a,b], we define the improper integral ee f to be 


{hee f= Sias f. 


Note that a function can be supported on more than one in- 
terval, for instance a function which is supported on [3, 4] is also 
automatically supported on [2,5] (why?). In principle, this might 
mean that our definition of {°° f is not well defined, however this 
is not the case: 


Lemma 14.8.5. If f: R — R is continuous and supported on 
an interval [a,b], and is also supported on another interval |c, d], 


then Jiao f= Sie, d f. 


466 14. Uniform convergence 


Proof. See Exercise 14.8.1. o 


Definition 14.8.6 (Approximation to the identity). Let € > 0 
and 0 < 6 < 1. A function f : R — R is said to be an (e, 6). 
approximation to the identity if it obeys the following three prop- 
erties: 


(a) f is supported on [—1,1], and f(z) > 0 for all —1 < z < 1. 
(b) f is continuous, and f°. f = 1. 


(c) |f(x)| < e for all 6 < |z| < 1. 


Remark 14.8.7. For those of you who are familiar with the Dirac 
delta function, approximations to the identity are ways to approx- 
imate this (very discontinuous) delta function by a continuous 
function (which is easier to analyze). We will not however discuss 
the Dirac delta function in this text. 


Our proof of the Weierstrass approximation theorem relies on 
three key facts. The first fact is that polynomials can be approx- 
imations to the identity: 


Lemma 14.8.8 (Polynomials can approximate the identity). For 
every € > 0 and 0 <6 < 1 there exists an (£, ô)-approrimation to 
the identity which is a polynomial P on |—1,1]. 


Proof. See Exercise 14.8.8. O 


We will use these polynomial approximations to the identity to 
approximate continuous functions by polynomials. We will need 
the following important notion of a convolution. 


Definition 14.8.9 (Convolution). Let f: R — Randg:R- 
R be continuous, compactly supported functions. We define the 
convolution f x g :R — R of f and g to be the function 


(F * g)(£) := f j we 


14.8. Uniform approximation by polynomials 467 


Note that if f and g are continuous and compactly supported, 
then for each x the function f(y)g(x—y) (thought of as a function 
of y) is also continuous and compactly supported, so the above 
definition makes sense. 


Remark 14.8.10. Convolutions play an important rôle in Fourier 
analysis and in partial differential equations (PDE), and are also 
important in physics, engineering, and signal processing. An in- 
depth study of convolution is beyond the scope of this text; only 
a brief treatment will be given here. 


Proposition 14.8.11 (Basic properties of convolution). Let f : 
R—-R,g:R-—-R, andh: R — R be continuous, compactly 
supported functions. Then the following statements are true. 


(a) The convolution f * g is also a continuous, compactly sup- 
ported function. 


(b) (Convolution is commutative) We have fxg = gx* f; in other 
words 


fola) = f ” F(y)gle —y) dy 
= J ” gly) f(e—y) dy 


= - f(z). 


(c) (Convolution is linear) We have f*(g+h)=f*eg+f *h. 
Also, for any real number c, we have f x (cg) = (cf) *g = 

c(f * g). 
Proof. See Exercise 14.8.11. L) 


Remark 14.8.12. There are many other important properties of 
convolution, for instance it is associative, (f x g) * h = f * (g * h), 
and it commutes with derivatives, (f « g)’ = f' x g = f x g', when 
f and g are differentiable. The Dirac delta function ô mentioned 
earlier is an identity for convolution: f*«éd = ô» f = f. These 
results are slightly harder to prove than the ones in Proposition 
14.8.11, however, and we will not need them in this text. 


468 14. Uniform convergence 


As mentioned earlier, the proof of the Weierstrass approxima, 
tion theorem relies on three facts. The second key fact is that 
convolution with polynomials produces another polynomial: 


Lemma 14.8.13. Let f : R — R be a continuous function sup. 
ported on [0,1], and let g: R — R be a continuous function sup. 
ported on [—1,1] which is a polynomial on |—1,1]. Then f *g isa 
polynomial on [0,1]. (Note however that it may be non-polynomia| 
outside of (0, 1].) 


Proof. Since g is polynomial on [—1,1], we may find an integer 
n > 0 and real numbers co, c1, ...,Cn such that 


n 
= S oga for all x € [—1, 1]. 
j=0 


On the other hand, for all x € [0,1], we have 
frsa)= | Fose- dv= f POE- a 


since f is supported on [0,1]. Since x € [0,1] and the variable of 
integration y is also in [0,1], we have x — y € [—1,1]. Thus we 
may substitute in our formula for g to obtain 


f + (2) = j f(y) lE- y) dy. 
j=0 


We expand this using the binomial formula (Exercise 7.1.4) to 
obtain 


n j 


feate)= |S Loba p” —y)~ * dy. 


j=0 


We can interchange the two summations (by Corollary 7.1.14) to 
obtain 


fsa = fs OD an z°(—y) Fă dy 


0,1 k=0 j=k 


14.8. Uniform approximation by polynomials 469 


(why did the limits of integration change? It may help to plot 7 
and k on a graph). Now we u a the k summation with 
the integral, and observe that z* is independent of y, to obtain 


f « g(x) = Dr ha OL TEET l (—y)?* dy. 


If we thus define 


n 7 7 
Cn | IO eg ai a 


for each k = 0,...,n, then Ck is a number which is independent 
of z, and we have 


f * gl£) = >> Cra" 
k=0 
for all x € [0,1]. Thus f * g is a polynomial on (0, 1]. o 


The third key fact is that if one convolves a uniformly contin- 
uous function with an approximation to the identity, we obtain a 
new function which is close to the original function (which explains 
the terminology “approximation to the identity” ): 


Lemma 14.8.14. Let f: R — R be a continuous function sup- 
ported on [0,1], which is bounded by some M > 0 (i.e., |f(x)| < M 
for allx € R), and let e > 0 and0 < 6 < 1 be such that one has 
f(z) — f(y)| < e whenever z,y E€ R and |x —y| < 6. Let g be any 
(e,6)-approzimation to the identity. Then we have 


|f * g(x) — f(x)| < (3M + 26)e 
for all x € [0,1]. 
Proof. See Exercise 14.8.14. CL) 


Combining these together, we obtain a preliminary version of 
the Weierstrass approximation theorem: 


470 14. Uniform convergence 


Corollary 14.8.15 (Weierstrass approximation theorem I). Le; 
f:R — R be a continuous function supported on [0,1]. Then 
for every € > 0, there exists a function P : R — R which is 
polynomial on [0,1] and such that |P(x) — f(x)| < £ for all z ẹ 
(0, 1]. 


Proof. See Exercise 14.8.15. o 


Now we perform a series of modifications to convert Corollary 
14.8.15 into the actual Weierstrass approximation theorem. We 
first need a simple lemma. 


Lemma 14.8.16. Let f : [0,1] — R be a continuous function 
which equals 0 on the boundary of [0,1], i.e., f(0) = f(1) = 0. Let 
F: R — R be the function defined by setting F(x) := f(x) for 
x € [0,1] and F(x) := 0 for x ¢ [0,1]. Then F is also continuous. 


Proof. See Exercise 14.8.16. O 


Remark 14.8.17. The function F obtained in Lemma 14.8.16 is 
sometimes known as the extension of f by zero. 


From Corollary 14.8.15 and Lemma 14.8.16 we immediately 
obtain 


Corollary 14.8.18 (Weierstrass approximation theorem II). Let 
f : [0,1] — R be a continuous function supported on [0,1] such 
that f(0) = f(1) =0. Then for every e > 0 there exists a polyno- 
mial P : [0,1] — R such that |P(x) — f(x)| < e for all x € (0, 1]. 


Now we strengthen Corollary 14.8.18 by removing the assump- 
tion that f(0) = f(1) = 0. 


Corollary 14.8.19 (Weierstrass approximation theorem III). Let 
f : [0,1] — R be a continuous function supported on [0,1]. Then 
for every £ > 0 there exists a polynomial P : [0,1] — R. such that 
|P(z) — f(x)| < e for all x € [0,1]. 


14.8. Uniform approximation by polynomials A471 


Proof. Let F: (0, 1] — R denote the function 


F(z) := f(x) — f(0) — 2(f(1) — f(0)). 


Observe that F is also continuous (why?), and that F'(0) = F(1) = 
0. By Corollary 14.8.18, we can thus find a polynomial Q : [0, 1] > 
R such that |Q(x) — F(x)| < £ for all x € [0,1]. But 


Q(z) — F(z) = Q(z) + f(0) + x(f(1) — F(0)) — F(z), 


so the claim follows by setting P to be the polynomial P(x) := 
Q(x) + f(0) + x(f(1) — f(0)). o 


Finally, we can prove the full Weierstrass approximation the- 
orem. 


Proof of Theorem 14.8.3. Let f : [a,b] — R be a continuous func- 
tion on [a,b]. Let g : [0,1] — R denote the function 


g(x) := f(a + (b—a)z) for all x € [0,1] 
Observe then that 


f(y) = g((y — a)/(b — a)) for all y € [a,b]. 


The function g is continuous on [0, 1] (why?), and so by Corollary 
14.8.19 we may find a polynomial Q : [0,1] — R such that |Q(x) — 
g(x)| < £ for all x € [0,1]. In particular, for any y € [a,b], we 
have 


IQ((y — a)/(b — a)) — g((y — a)/(b — a))| < €. 


If we thus set P(y) := Q((y — a) /(b — a)), then we observe that P 
is also a polynomial (why?), and so we have |P(y) — g(y)| < £ for 
all y € [a,b], as desired. oO 


Remark 14.8.20. Note that the Weierstrass approximation the- 
orem only works on bounded intervals [a,b]; continuous func- 
tions on R cannot be uniformly approximated by polynomials. 
For instance, the exponential function f : R — R defined by 
f(z) := e7 (which we shall study rigourously in Section 15.5) 


472 14. Uniform convergence 


cannot be approximated by any polynomial, because exponentia] 
functions grow faster than any polynomial (Exercise 15.5.9) and 
so there is no way one can even make the sup metric between f 
and a polynomial finite. 


Remark 14.8.21. There is a generalization of the Weierstrass 
approximation theorem to higher dimensions: if K is any compact 
subset of R” (with the Euclidean metric dj2), and f : K > Risa 
continuous function, then for every £ > 0 there exists a polynomial 
P : K — R of n variables 71,...,2, such that do(f, P) < €. This 
general theorem can be proven by a more complicated variant of 
the arguments here, but we will not do so. (There is in fact an even 
more general version of this theorem applicable to an arbitrary 
metric space, known as the Stone- Weierstrass theorem, but this is 
beyond the scope of this text.) 


Exercise 14.8.1. Prove Lemma 14.8.5. 


Exercise 14.8.2. (a) Prove that for any real number 0 < y < 1 and 
any natural number n > 0, that (1 — y)” > 1 — ny. (Hint: induct 

on n. Alternatively, differentiate with respect to y.) 
(b) Show that Jt a-a)" dx > Ta (Hint: for |x| < 1/,/n, use part 
(a); for |z| > 1/./n, just use the fact that (1 — x”) is positive. It 
is also possible to proceed via trigonometric substitution, but I 
would not recommend this unless you know what you are doing.) 


(c) Prove Lemma, 14.8.2. (Hint: choose f(x) to equal c(1 — x?) for 
x € [—1,1] and to equal zero for x ¢ [—1,1], where N is a large 
number N, where c is chosen so that f has integral 1, and use 
(b).) 
Exercise 14.8.3. Let f : R — R be a compactly supported, continuous 
function. Show that f is bounded and uniformly continuous. (Hint: the 
idea is to use Proposition 13.3.2 and Theorem 13.3.5, but one must first 
deal with the issue that the domain R of f is non-compact.) 


Exercise 14.8.4. Prove Proposition 14.8.11. (Hint: to show that f xg is 
continuous, use Exercise 14.8.3.) 


Exercise 14.8.5. Let f : R — R and g : R — R be continuous, com- 
pactly supported functions. Suppose that f is supported on the interval 


14.8. Uniform approximation by polynomials 473 


(0, 1], and g is constant on the interval (0, 2] (i.e., there is a real number 
c such that g(x) = c for all z € [0,2]). Show that the convolution f * g 
is constant on the interval [1, 2]. 


Exercise 14.8.6. (a) Let g be an (€,6) approximation to the identity. 
Show that 1 — 2e < fi 599 <1. 


(b) Prove Lemma 14.8.14. (Hint: begin with the identity 


f*9(2) = f f(e —-y)g(y) dy = f „TEI dy 


+ a f(z y)gly) dy + j. F(E- y)gtu) ay. 


The idea is to show that the first integral is close to f(z), and 
that the second and third integrals are very small. To achieve the 
former task, use (a) and the fact that f(x) and f(x—y) are within 
€ of each other; to achieve the latter task, use property (c) of the 
approximation to the identity and the fact that f is bounded.) 


Exercise 14.8.7. Prove Corollary 14.8.15. (Hint: combine Exercise 14.8.3, 
Lemma 14.8.8, Lemma 14.8.13, and Lemma 14.8.14.) 

Exercise 14.8.8. Let f : [0,1] — R bea continuous function, and suppose 
that So, f(x)z" dx = 0 for all non-negative integers n = 0,1,2,.... 
Show that f must be the zero function f = 0. (Hint: first show that 
Jo 1] f(x)P(x) dx = 0 for all polynomials P. Then, using the Weierstrass 
approximation theorem, show that Jio yf (x) f(x) dx = 0.) 


Chapter 15 


Power series 


15.1 Formal power series 


We now discuss an important subclass of series of functions, that 
of power series. As in earlier chapters, we begin by introducing 
the notion of a formal power series, and then focus in later sections 
on when the series converges to a meaningful function, and what 
one can say about the function obtained in this manner. 


Definition 15.1.1 (Formal power series). Let a be a real number. 
A formal power series centered at a is any series of the form 


` Cn(x — a)” 


n=0 


where co, C1,... is a sequence of real numbers (not depending on 
az); we refer to cn as the nt? coefficient of this series. Note that 
each term cn(x — a)” in this series is a function of a real variable 
. 


Example 15.1.2. The series }>””_) n!(z — 2)” is a formal power 
series centered at 2. The series $ p—o 27(x — 3)” is not a formal 
power series, since the coefficients 27 depend on z. 


We call these power series formal because we do not yet as- 
sume that these series converge for any x. However, these series 
are automatically guaranteed to converge when x = a (why?). 


15.1. Formal power series 475 


In general, the closer x gets to a, the easier it is for this series 
to converge. To make this more precise, we need the following 
definition. 


Definition 15.1.3 (Radius of convergence). Let }>?°_) Cn(x — a)" 
be a formal power series. We define the radius of convergence R 
of this series to be the quantity 

=a 1 

lim sup,_,oo |¢n|1/” 


where we adopt the convention that t = +00 and te = 0: 


Remark 15.1.4. Each number |c,,|!/" is non-negative, so the 
limit limsup,,_,., len|! n” can take on any value from 0 to +00, 
inclusive. Thus R can also take on any value between 0 and +00 
inclusive (in particular it is not necessarily a real number). Note 
that the radius of convergence always exists, even if the sequence 
len]! n is not convergent, because the lim sup of any sequence 
always exists (though it might be +00 or —oo). 


Example 15.1.5. The series 0°’. n(—2)"(x — 3)” has radius of 
convergence 


1 1 1 
lim sup,_,. |n(—27)|1/"  limsupp œo 2n 2 


The series $22} 2” (x + 2)" has radius of convergence 


1 1 1 


lim SUPp—oo |277 |!/" ~ limsup,,.2" +o 

The series } po 277° (2 + 2)” has radius of convergence 
1 E 1 1 

lim suP;,_so0 |27"" |1/" ~~ limsup,_,.927" 0 


The significance of the radius of convergence is the following. 


Theorem 15.1.6. Let $` —ocn(£— a)” be a formal power series, 
and let R be its radius of convergence. 


476 


(a) 


(b) 


15. Power series 


(Divergence outside of the radius of convergence) If xz ER 
is such that |x — a| > R, then the series $ >o cn(x — a)” is 
divergent for that value of z. 


(Convergence inside the radius of convergence) If x € R is 
such that |x — a| < R, then the series $ >o Cn(x — a)” is 
absolutely convergent for that value of z. 


For parts (c)-(e) we assume that R > 0 (i.e., the series con- 
verges at at least one other point thanx =a). Let f : (a— R,a + 
R) > R be the function f(x) := $ 7o Cn(x — a)"; this function is 
guaranteed to exist by (b). 


(c) 


(d) 


(e) 


(Uniform convergence on compact sets) For any 0 <r < R, 
the series $ p —ocn(£ — a)” converges uniformly to f on the 
compact interval [a— r,a +r}. In particular, f is continuous 
on (a — R,a + R). 


(Differentiation of power series) The function f is differ- 
entiable on (a — R,a + R), and for any O < r < R, the 
series $ >g nCcn(x —a)"—! converges uniformly to f' on the 
interval [a — r,a + r]. 


(Integration of power series) For any closed interval [y, z] 
contained in (a — R,a + R), we have 


n+l _ (y = a) r 


O (z-a) 
hat Ee n+l 


n=0 


Proof. See Exercise 15.1.1. O 


Theorem 15.1.6 (a) and (b) of the above theorem give another 
way to find the radius of convergence, by using your favorite con- 
vergence test to work out the range of x for which the power series 
converges: 


Example 15.1.7. Consider the power series } po n(x—1)”. The 
ratio test shows that this series converges when |x — 1| < 1 and 


15.2. Real analytic functions 477 


diverges when |z—1| > 1 (why?). Thus the only possible value for 
the radius of convergence is R = 1 (if R < 1, then we have con- 
tradicted Theorem 15.1.6(a); if R > 1, then we have contradicted 
Theorem 15.1.6(b)). 


Remark 15.1.8. Theorem 15.1.6 is silent on what happens when 
|z — a| = R, i.e., at the points a — R and a+ R. Indeed, one can 
have either convergence or divergence at those points; see Exercise 
15.1.2. 


Remark 15.1.9. Note that while Theorem 15.1.6 assures us that 
the power series } p —o Cn(x — a)” will converge pointwise on the 
interval (a — R,a + R), it need not converge uniformly on that 
interval (see Exercise 15.1.2(e)). On the other hand, Theorem 
15.1.6(c) assures us that the power series will converge on any 
smaller interval [a — r,a + r]. In particular, being uniformly con- 
vergent on every closed subinterval of (a— R,a + R) is not enough 
to guarantee being uniformly convergent on all of (a — R,a + R). 


Exercise 15.1.1. Prove Theorem 15.1.6. (Hints: for (a) and (b), use the 
root test (Theorem 7.5.1). For (c), use the Weierstrass M-test (Theorem 
14.5.7). For (d), use Theorem 14.7.1. For (e), use Corollary 14.8.18. 


Exercise 15.1.2. Give examples of a formal power series $ >” 9 Cn&” cen- 
tered at 0 with radius of convergence 1, which 


(a) diverges at both z = 1 and z = —1; 
) diverges at z = 1 but converges at z = —1; 
) converges at z = 1 but diverges at x = —1; 
(d) converges at both z = 1 and x = —1. 
) 


converges pointwise on (—1,1), but does not converge uniformly 
on (—1, 1). 


15.2 Real analytic functions 


A function f(x) which is lucky enough to be representable as a 
power series has a special name; it is a real analytic function. 


478 15. Power series 


Definition 15.2.1 (Real analytic functions). Let E be a subset 
of R, and let f : E — R be a function. If a is an interior point 
of E, we say that f is real analytic at a if there exists an open 
interval (a — r,a + r) in E for some r > 0 such that there exists 
a power series ) 77-9 Cn(z — a)” centered at a which has a radius 
of convergence greater than or equal to r, and which converges to 
f on (a—r,a +r). If E is an open set, and f is real analytic at 
every point a of E, we say that f is real analytic on E. 


Example 15.2.2. Consider the function f : R\{1} — R defined 
by f(x) := 1/(1 — x). This function is real analytic at 0 because 
we have a power series }>°°_, x” centred at 0 which converges to 
1/(1— zx) = f(z) on the interval (—1,1). This function is also real 
analytic at 2 because we have a power series )\°°_)(—1)"*!(2—2)” 
which converges to LCE - + = f(x) on the interval (1,3) 
(why? use Lemma 7.3.3). In fact this function is real analytic on 
all of R\{1}; see Exercise 15.2.2. 


Remark 15.2.3. The notion of being real analytic is closely re- 
lated to another notion, that of being complex analytic, but this 
is a topic for complex analysis, and will not be discussed here. 


We now discuss which functions are real analytic. From The- 
orem 15.1.6(c) and (d) we see that if f is real analytic at a point 
a, then f is both continuous and differentiable on (a—r,a+r) for 
some a € R. We can in fact say more: 


Definition 15.2.4 (k-times differentiability). Let E be a subset of 
R. We say a function f : E — Ris once differentiable on E iff it is 
differentiable. More generally, for any k > 2 we say that f : E — 
R is k times differentiable on E, or just k times differentiable, iff 
f is differentiable, and f’ is k — 1 times differentiable. If f is k 
times differentiable, we define the kt? derivative f : E — R by 
the recursive rule f) := f’, and f(*) = (f@-)) for all k > 2. We 
also define f) := f (this is f differentiated 0 times), and we allow 
every function to be zero times differentiable (since clearly f (0) 
exists for every f). A function is said to be infinitely differentiable 
(or smooth) iff it is k times differentiable for every k > 0. 


15.2. Real analytic functions 479 


Example 15.2.5. The function f(x) := |z|’ is twice differentiable 
on R, but not three times differentiable (why?). Indeed, f) = 
" — 6|x|, which is not differentiable, at 0. 


Proposition 15.2.6 (Real analytic functions are k-times differ- 
entiable). Let E be a subset of R, let a be an interior point of E, 
and and let f be a function which is real analytic at a, thus there 
is an r > 0 for which we have the power series expansion 


f(t) = X ea(a — a)” 
n=0 


forall x E€ (a—r,a+r). Then for every k > 0, the function f is 
k-times differentiable on (a—r,a+r), and for each k > 0 the k** 
derivative is given by 


fO (2) = Vo enge(n + 1)(n +2)... (n + k)(@ — a)” 


n=0 


= Ý orth a E) ey — a)" 


| 
aez n! 
for allz € (a—r,a+t+r). 
Proof. See Exercise 15.2.3. oO 


Corollary 15.2.7 (Real analytic functions are infinitely differen- 
tiable). Let E be an open subset of R, and let f: E > R bea 
real analytic function on E. Then f is infinitely differentiable on 
E. Also, all derivatives of f are also real analytic on E. 


Proof. For every point a € E and k > 0, we know from Propo- 
sition 15.2.6 that f is k-times differentiable at a (we will have 
to apply Exercise 10.1.1 k times here, why?). Thus f is k-times 
differentiable on E for every k > 0 and is hence infinitely differen- 
tiable. Also, from Proposition 15.2.6 we see that each derivative 
f) of f has a convergent power series expansion at every xr E E 
and thus f(*) is real analytic. O 


480 15. Power series 


Example 15.2.8. Consider the function f : R — R defined by 
f(x) := |x|. This function is not differentiable at x = 0, and hence 
cannot be real analytic at x = 0. It is however real analytic at 
every other point z € R\{0} (why?). 


Remark 15.2.9. The converse statement to Corollary 15.2.7 is 
not true; there are infinitely differentiable functions which are not 
real analytic. See Exercise 15.5.4. 


Proposition 15.2.6 has an important corollary, due to Brook 
Taylor (1685-1731). 


Corollary 15.2.10 (Taylor’s formula). Let E be a subset of R, 
let a be an interior point of E, and let f : E — R be a function 
which is real analytic at a and has the power series expansion 


f(z) = 3 Cr(x — a)” 


n=0 


for all x E (a—r,a+r) and somer > 0. Then for any integer 
k > 0, we have 
fH) (a) = kl ep, 


where k! := 1x 2x...xk (and we adopt the convention that 
0! = 1). In particular, we have Taylor’s formula 


f(x )= Z > O fi" Li — a)” 


n=0 
for all x in (a—- r,a +r). 


Proof. See Exercise 15.2.4. O 


i (n $ : 
The power series $ po AMOT — a)” is sometimes called the 
Taylor series of f around a. Taylor’s formula thus asserts that if 
a function is real analytic, then it is equal to its Taylor series. 


Remark 15.2.11. Note that Taylor’s formula only works for func- 
tions which are real analytic; there are examples of functions which 
are infinitely differentiable but for which Taylor’s theorem fails 
(see Exercise 15.5.4). 


16.2. Real analytic functions 481 


Another important corollary of Taylor’s formula is that a real 
analytic function can have at most one power series at a point: 


Corollary 15.2.12 (Uniqueness of power series). Let E be a sub- 
set of R, let a be an interior point of E, and let f: ER bea 
function which is real analytic ata. Suppose that f has two power 
series expansions 


f(z) =X _ en(a — a)” 


n=0 


and 
co 
f(z) = X d(T — a)” 
n=0 
centered at a, ecah with a non-zero radius of convergence. Then 
Cn = dn for all n > Q. 


Proof. By Corollary 15.2.10, we have f) (a) = k!cp for all k > 0. 
But we also have f‘*)(a) = kld,, by similar reasoning. Since k! is 
never zero, we can cancel it and obtain ck = dk for all k > 0, as 
desired. O 


Remark 15.2.13. While a real analytic function has a unique 
power series around any given point, it can certainly have different 
power series at different points. For instance, the function f(x) := 
rat defined on R — {1}, has the power series 


f(z): -Yr 


n=0 


around 0, on the interval (—1, 1), but also has the power series 


f(a) = = - = 5° 2(2(0-3))" = ote—4p 


cae ca n=0 n=0 


around 1/2, on the interval (0, 1) (note that the above power series 
has a radius of convergence of 1/2, thanks to the root test. See 
also Exercise 15.2.8. 


482 15. Power series 


Ezercise 15.2.1. Let n > 0 be an integer, let c,a be real numbers, and 
let f be the function f(x) := c(z — a)". Show that f is infinitely differ- 
entiable, and that f“*)(x) = eae (x—a)"—* for all integers 0 < k < n. 
What happens when k > n? 


Exercise 15.2.2. Show that the function f defined in Example 15.2.2 is 
real analytic on all of R\{1}. 


Exercise 15.2.3. Prove Proposition 15.2.6. (Hint: induct on k and use 
Theorem 15.1.6(d)). 


Exercise 15.2.4. Use Proposition 15.2.6 and Exercise 15.2.1 to prove 
Corollary 15.2.10. 


Exercise 15.2.5. Let a,b be real numbers, and let n > 0 be an integer. 
Prove the identity 


n 


(z-a)" = Y — 6-0)" — b)" 


| 
<= mi(n — m)! 


for any real number x. (Hint: use the binomial formula, Exercise 7.1.4.) 
Explain why this identity is consistent with Taylor’s theorem and Exer- 
cise 15.2.1. (Note however that Taylor’s theorem cannot be rigourously 
applied until one verifies Exercise 15.2.6 below.) 


Ezercise 15.2.6. Using Exercise 15.2.5, show that every polynomial P(z) 
of one variable is real analytic on R. 


Exercise 15.2.7. Let m > 0 be a positive integer, and let 0 < x < r be 
real numbers. Use Lemma 7.3.3 to establish the identity 


> er -n 


n=0 


for all x € (—r,r). Using Proposition 15.2.6, conclude the identity 


co 


r = >. n! grp r 
(r= reti z m!(n- m)! 


for all integers m > 0 and z € (—r,r). Also explain why the series on 
the right-hand side is absolutely convergent. 


15.3. 


Abel’s theorem 483 


Exercise 15.2.8. Let E be a subset of R, let a be an interior point of E, 
and let f : E — R. be a function which is real analytic in a, and has a 
power series expansion 


f(z) = X en(e — a)” 


n=0 


at a which converges on the interval (a — r,a +r). Let (b — s,b + s) be 
any sub-interval of (a — r,a + r) for some s > 0. 


(a) 
(b) 


(c) 


(d 


~~ 


(e) 


(£) 


Prove that |a — b| < r — s, so in particular |a — b| < r. 


Show that for every 0 < € < r, there exists a C > 0 such that 
lcn| < C(r —e)~” for all integers n > 0. (Hint: what do we know 
about the radius of convergence of the series $` p_o Cn(z — a)” ?) 


Show that the numbers dọ, dı,... given by the formula 


co 

n! ss ; 
dm = ` EEA = a)” May for all integers m. > 0 
n=m 
are well-defined, in the sense that the above series is absolutely 
convergent. (Hint: use (b) and the comparison test, Corollary 
7.3.2, followed by Exercise 15.2.7.) 


Show that for every 0 < e < s there exists a C > 0 such that 
ldm| < C(s — €)” 

for all integers m > 0. (Hint: use the comparison test, and Exer- 

cise 15.2.7.) 


Show that the power series 9 p -o dm(x — b)” is absolutely con- 
vergent for x € (b — s,b + s) and converges to f(x). (You may 
need Fubini’s theorem for infinite series, Theorem 8.2.2, as well as 
Exercise 15.2.5). 


Conclude that f is real analytic at every point in (a— r,a +r). 


15.3 Abel’s theorem 


Let f(x) = oP? 5 Cn(x — a)” be a power series centered at a with a 
radius of convergence 0 < R < ov strictly between 0 and infinity. 


484 15. Power series 


From Theorem 15.1.6 we know that the power series converges 
absolutely whenever |x — a| < R, and diverges when |z — a| > R. 
However, at the boundary |x — a| = R the situation is more com- 
plicated; the series may either converge or diverge (see Exercise 
15.1.2). However, if the series does converge at the boundary 
point, then it is reasonably well behaved; in particular, it is con- 
tinuous at that boundary point. 

Theorem 15.3.1 (Abel’s theorem). Let f(x) = } > 9 ¢n(z — a)” 
be a power series centered ata with radius of convergence 0 < R < 


oo. If the power series converges at a + R, then f is continuous 
ata+ R, i.e. 


oO ore) 
lim ta)" = R”. 
x—a+R:xE€(a—R,a+R) 2 cnl ) D a 


n=0 
Similarly, if the power series converges at a — R, then f is con- 
tinuous at a — R, i.e. 


(© 0) OO 
. _ 7)" — — RY”. 
D a) 2 cnl i 


z—=a— 
n=0 


Before we prove Abel’s theorem, we need the following lemma. 


Lemma 15.3.2 (Summation by parts formula). Let (an)°9 and 
(bn )P2. be sequences of real numbers which converge to limits A 
and B respectively, i.e., limp+scQn = A and limy+.obn = B. 
Suppose that the sum 0 (Gnti — an )bn is convergent. Then the 


sum X —oa@n+1(bn+1 — bn) is also convergent, and 


ore) ore) 
X (any = An) bn = AB — aobo — > Qn+1 (bn+1 = bn). 


n=0 n=0 


Proof. See Exercise 15.3.1. 0O 


Remark 15.3.3. One should compare this formula with the more 
well-known integration by parts formula 


| ~ f'(a)g(2) de = f(2)g(2)|%° — f Fæ) ae, 
0 0 


see Proposition 11.10.1. 


15.3. Abel’s theorem 485 


Proof of Abel’s theorem. It will suffice to prove the first claim, i.e., 
that 
_ a\n R” 
z—at+R: con R a+R) © 3 en(z a) -5 a 


whenever the sum } p_oCnR” converges; the second claim will 
then follow (why?) by replacing cn by (—1)"cp in the above claim. 
If we make the substitutions dn := cn R” and y := 75%, then the 
above claim can be rewritten as 


yl: ee —1 dha 7 = Yt 


whenever the sum $` >o dn converges. (Why is this equivalent to 
the previous claim?) 


Write D := } >o dn, and for every N > 0 write 
N-1 
sv :=(> dn)- D 
n=0 
so in particular sọ = —D. Then observe that limy.~. Sn = 0, 


and that dn = Sn41 — Sn. Thus for any y € (—1,1) we have 


` dny” = N (Sn41 = Sn)y”: 


n=0 n=0 


Applying the summation by parts formula (Lemma 15.3.2), and 
noting that limp. y” = 0, we obtain 


(© @) (© 9) 
N day” = —Soy® — X Sny ly"! — y”). 


n=0 n=0 


Observe that —Soy? = +D. Thus to finish the proof of Abel’s 
theorem, it will suffice to show that 


co 
li Sn4ily"t* — y") =0 
panies nti(y y”) 


486 15. Power series 


Since y converges to 1, we may as well restrict y to [0,1) instead 
of (—1, 1); in particular we may take y to be positive. 

From the triangle inequality for series (Proposition 7.2.9), we 
have 


|X Snl = y")| < X Snl"! — y”)| 


n=0 n=0 


= ` ISn+al(y” — y"), 


n=0 


so by the squeeze test (Corollary 6.4.14) it suffices to show that 


OO 
lim Sneil(y™ — y™*") = 0. 
viene n+ I( ) 
The expression )>7°.5 |Sn4il(y" — y"*") is clearly non-negative, so 
it will suffice to show that 


OO 
lim sup ` ISn+il(y™ — yt) = 0. 
y—>l:yE(—1,1) n=0 


Let £ > 0. Since S, converges to 0, there exists an N such that 
|S,| < € for all n > N. Thus we have 


fre) N fore) 
X Snl) < > Snill- tt) SO eltt). 
n=0 n=0 n=N+1 


The last summation is a telescoping series, which sums to eyN+t! 
(See Lemma 7.2.15, recalling from Lemma 6.5.2 that y” — 0 as 
n — oo), and thus 


fore) N 
S 2 |Snsail(y” — yt) < So Snail” — yt?) + eyt. 


n=0 n=0 
Now take limits as y — 1. Observe that y” — y"*! — 0 as y —> 1 


for every n € 0,1,...,N. Since we can interchange limits and 


15.4. Multiplication of power series 487 


finite sums (Exercise 7.1.5), we thus have 


(© @) 
lim sup )~|Snoil(y" —y"*?) < e. 
n> co 


n=0 


But £ > 0 was arbitrary, and thus we must have 


OO 
lim sup J` |Sneal(y" — y"+) = 0 
m—0O 


n=0 


since the left-hand side must be non-negative. The claim follows. 
0 


Ezercise 15.3.1. Prove Lemma 15.3.2. (Hint: first work out the relation- 
ship between the partial sums )>_)(@n41—@n)bp and Jg @n4i(bn41— 


bn)-) 


15.4 Multiplication of power series 


We now show that the product of two real analytic functions is 
again real analytic. 
Theorem 15.4.1. Let f : (a—r,a+r)— R andg: (a-—r,at 
r) + R be functions analytic on (a—r,a+r), with power series 
ezpan sions 
(©) 
f(z) =} cn(z- a)” 
n=0 
and 
(© 0) 
g(x) = ) | dn(x — a)” 
n=0 
respectively. Then fg : (a—r,a +r) — R is also analytic on 
(a—r,a+r), with power series expansion 


f(x)g(x) = $ en(x — a)” 


n=0 


where en := Xr —o Cmdn—m. 


488 15. Power series 


Remark 15.4.2. The sequence (e,,)°2) is sometimes referred to 


as the convolution of the sequences (cn)$Lo and (dp); it is 
closely related (though not identical) to the notion of convolution 
introduced in Definition 14.8.9. 


Proof. We have to show that the series } >o en(x— a)” converges 


to f(x)g(x) for all x € (a— r,a +r). Now fix z to be any point 
in (a — r,a +r). By Theorem 15.1.6, we see that both f and 
g have radii of convergence at least r. In particular, the series 
Yo Cn(x — a)” and X} po dn(x — a)” are absolutely convergent. 
Thus if we define 


C := >  |en(x — a)"| 
n=0 
and 


D= ` Idn(z — a)”| 


n=0 
then C and D are both finite. 
For any N > 0, consider the partial sum 


N ow 
X X lem (a — a)dn(a — a)”|. 


n=0 m=0 


We can rewrite this as 


N oO 
XC |dn(a — a)"| Y> Jem(a — a)", 


n=0 m=0 


which by definition of C is equal to 


N 
V` |dn(x — a)"|C, 


n=0 


which by definition of D is less than or equal to DC. Thus the 
above partial sums are bounded by DC for every N. In particular, 


the series 
OO OO 
XO X lem(a - a)dn(a — a)”| 


n=0 m=0 


15.4. Multiplication of power series 489 


is convergent, which means that the sum 


>D ` C(x — a)” dn (£ — a)” 


n=0 m=0 


is absolutely convergent. 
Let us now compute this sum in two ways. First of all, we can 
pull the d(x — a)” factor out of the Xo Summation, to obtain 


So da(z — a)" ` Cm(z — a)™. 


n=0 m=0 
By our formula for f(x), this is equal to 
OO 
X dn(z — a)" f (2); 
n=0 
by our formula for g(x), this is equal to f(x)g(x). Thus 
OO OO 
f(a)g(t) = > X cml£ — a)” dn(a — a)”. 
n=0 m=0 
Now we compute this sum in a different way. We rewrite it as 
OO OO 
f(z)g(z) = X X cmdr(z - a)"*™. 
n=0 m=0 


By Fubini’s theorem for series (Theorem 8.2.2), because the series 
was absolutely convergent, we may rewrite it as 


(© 9) (© 9) 
fæle) = J P emda(e — a)" t. 
m=0 n=0 
Now make the substitution n’ := n + m, to rewrite this as 


f(x)g(x) = $ ` Cmdni-m(£ — a)”. 


m=0 n'!'=m 


490 15. Power series 


If we adopt the convention that d; = 0 for all negative j, then this 
is equal to 


f(@)g(@) = Y Y cmd me — a)". 


m=0 n’/=0 


Applying Fubini’s theorem again, we obtain 


fæla) = > Y cmdn'—m( — a)", 


n'’=—0 m=0 
which we can rewrite as 
CO (© @) 
f(x)g(x) = ` (x — a)” ` Cmdn'—m. 
n'=0 m=0 


Since d; was 0 when 7 is negative, we can rewrite this as 


f(a)g(@) = So (e-a) Y cmdw-m, 


n'=0 m=0 


which by definition of e is 


f(a)g(a) = X` en(a — a)”, 


n'=0 


as desired. g 


15.5 The exponential and logarithm functions 


We can now use the machinery developed in the last few sections 
to develop a rigourous foundation for many standard functions 
used in mathematics. We begin with the exponential function. 


Definition 15.5.1 (Exponential function). For every real number 
x, we define the exponential function exp(x) to be the real number 


15.6. The exponential and logarithm functions 491 
Theorem 15.5.2 (Basic properties of exponential). 


(a) For every real number x, the series $ p—o zi is absolutely 
convergent. In particular, exp(x) exists and is real for every 
x E€ R, the power series X zo zi has an infinite radius of 
convergence, and exp is a real analytic function on (—oo, 00). 


(b) exp is differentiable on R, and for every x € R, exp'(x) = 
exp(z). 

(c) exp is continuous on R, and for every interval [a,b], we have 
Jia €XP(2) dz = exp(b) — exp(a). 

(d) For every x,y € R, we have exp(z + y) = exp(zx) exp(y). 


(e) We have exp(0) = 1. Also, for every x € R, exp(z) is 
positive, and exp(—x) = 1/ exp(z). 


(f) exp is strictly monotone increasing: in other words, if x,y 
are real numbers, then we have exp(y) > exp(xz) if and only 


ify > cz. 
Proof. See Exercise 15.5.1. O 


One can write the exponential function in a more compact 
form, introducing famous Euler’s number e = 2.71828183..., also 
known as the base of the natural logarithm: 


Definition 15.5.3 (Euler’s number). The number e is defined to 
be 


fore) 
1 1 1 1 1 
esexp(l)= ) TS =a+qtgtate 


n=0 


Proposition 15.5.4. For every real number x, we have exp(z) = 
ae 


Proof. See Exercise 15.5.3. O 


492 15. Power series 


In light of this proposition we can and will use e” and exp(z) 
interchangeably. 

Since e > 1 (why?), we see that e7” — +00 as z —> +00, 
and e7? — 0 as x — —oo. From this and the intermediate value 
theorem (Theorem 9.7.1) we see that the range of the function 
exp is (0,00). Since exp is increasing, it is injective, and hence 
exp is a bijection from R to (0, 00), and thus has an inverse from 
(0,00) — R. This inverse has a name: 


Definition 15.5.5 (Logarithm). We define the natural logarithm 
function log : (0,00) — R (also called In) to be the inverse of the 
exponential function. Thus exp(log(x)) = x and log(exp(z)) = z. 


Since exp is continuous and strictly monotone increasing, we 
see that log is also continuous and strictly monotone increasing 
(see Proposition 9.8.3). Since exp is also differentiable, and the 
derivative is never zero, we see from the inverse function theorem 
(Theorem 10.4.2) that log is also differentiable. We list some other 
properties of the natural logarithm below. 


Theorem 15.5.6 (Logarithm properties). 


(a) For every x € (0,00), we have In'(x) = +. In particular, by 
the fundamental theorem of calculus, we have Sia T di = 
In(b) — In(a) for any interval [a,b] in (0,00). 


(b) We have In(xy) = ln(x) + In(y) for all x,y € (0, 00). 
(c) We have In(1) = 0 and In(1/z) = — ln(x) for all x € (0, 00). 
(d) For any x € (0,00) andy E R, we have In(z¥) = yln(z). 


(e) For any x € (—1,1), we have 


OO g” 
ln(1 — x) = -5 —. 
n=1 s 


In particular, ln is analytic at 1, with the power series ez- 
pansion 


œO /_1ı\n+i1 
ln(x) = > iy — 1)" 
n=1 


15.5. The exponential and logarithm functions 493 


for x € (0,2), with radius of convergence 1. 


Proof. See Exercise 15.5.5. O 


Example 15.5.7. We now give a modest application of Abel’s 
theorem (Theorem 15.3.1): from the alternating series test we see 


_1\n+1 
that 5,1 cu is convergent. By Abel’s theorem we thus see 


that Ele — (1PH 
Dea ty ey 


n=1 
= lim ln(x) = In(2), 


thus we have the formula 


+ 
n| mm 
| 


Pe) ee ee ; 


1 
2 3 


Exercise 15.5.1. Prove Theorem 15.5.2. (Hints: for part (a), use the 
ratio test. For parts (bc), use Theorem 15.1.6. For part (d), use Theorem 
15.4.1. For part (e), use part (d). For part (f), use part (d), and prove 
that exp(z) > 1 when z is positive. You may find the binomial formula 
from Exercise 7.1.4 to be useful. 


Exercise 15.5.2. Show that for every integer n > 3, we have 


< : F : T oe 
(n+1)! (n+2)! 


nt 
(Hint: first show that (n + k)! > 2*n! for all k = 1,2,3,....) Conclude 
that n!e is not an integer for every n > 3. Deduce from this that e is 
irrational. (Hint: prove by contradiction.) 


Exercise 15.5.3. Prove Proposition 15.5.4. (Hint: first prove the claim 
when zx is a natural number. Then prove it when z is an integer. Then 
prove it when z is a rational number. Then use the fact that real numbers 
are the limits of rational numbers to prove it for all real numbers. You 
may find the exponent laws (Proposition 6.7.3) to be useful.) 


Exercise 15.5.4. Let f : R — R be the function defined by setting 
f(x) := exp(—1/z) when z > 0, and f(z) := 0 when z < 0. Prove that 
f is infinitely differentiable, and f‘*)(0) = 0 for every integer k > 0, but 
that f is not real analytic at 0. 


494 15. Power series 


Exercise 15.5.5. Prove Theorem 15.5.6. (Hints: for part (a), use the 
inverse function theorem (Theorem 10.4.2) or the chain rule (Theorem 
10.1.15). For parts (bcd), use Theorem 15.5.2 and the exponent laws 
(Proposition 6.7.3). For part (e), start with the geometric series formula 
(Lemma 7.3.3) and integrate using Theorem 15.1.6). 


Exercise 15.5.6. Prove that the natural logarithm function is real ana- 
lytic on (0, +00). 


Exercise 15.5.7. Let f : R — (0,00) bea positive, real analytic function 
such that f'(x) = f(x) for all z e R. Show that f(x) = Ce? for some 
positive constant C; justify your reasoning. (Hint: there are basically 
three different proofs available. One proof uses the logarithm function, 
another proof uses the function e~*, and a third proof uses power series. 
Of course, you only need to supply one proof.) 


Exercise 15.5.8. Let m > 0 be an integer. Show that 


T 
o R 
(Hint: what happens to the ratio between e7+1/(2+1)™ and e7/z™ as 
x — +007) 


Exercise 15.5.9. Let P(x) be a polynomial, and let c > 0. Show that 
there exists a real number N > 0 such that e° > |P(x)| for all z > N; 
thus an exponentially growing function, no matter how small the growth 
rate c, will eventually overtake any given polynomial P(x), no matter 
how large. (Hint: use Exercise 15.5.8.) 


Exercise 15.5.10. Let f : (0, +00) x R — R be the exponential function 
f(x,y) := x”. Show that f is continuous. (Hint: note that Propositions 
9.4.10, 9.4.11 only show that f is continuous in each variable, which is 
insufficient, as Exercise 13.2.11 shows. The easiest way to proceed is to 
write f(x,y) = exp(ylnz) and use the continuity of exp() and In(). For 
an extra challenge, try proving this exercise without using the logarithm 
function.) 


15.6 A digression on complex numbers 


To proceed further we need the complex number system C, which 
is an extension of the real number system R. A full discussion 
of this important number system (and in particular the branch of 
mathematics known as complez analysis) is beyond the scope of 


15.6. A digression on complex numbers 495 


this text; here, we need the system primarily because of a very 
useful mathematical operation, the complex exponential function 
zt exp(z), which generalizes the real exponential function x +> 
exp(x) introduced in the previous section. 

Informally, we could define the complex numbers as 


Definition 15.6.1 (Informal definition of complex numbers). The 
complex numbers C are the set of all numbers of the form a + bi, 
where a,b are real numbers and 7 is a square root of —1, i? = —1. 


However, this definition is a little unsatisfactory as it does not 
explain how to add, multiply, or compare two complex numbers. 
To construct the complex numbers rigourously we will first in- 
troduce a formal version of the complex number a + bi, which 
we shall temporarily denote as (a,b); this is similar to how in 
Chapter 4, when constructing the integers Z, we needed a formal 
notion of subtraction a—b before the actual notion of subtraction 
a — b could be introduced, or how when constructing the rational 
numbers, a formal notion of division a//b was needed before it 
was superceded by the actual notion a/b of division. It is also 
similar to how, in the construction of the real numbers, we de- 
fined a formal limit LIM,_,..an before we defined a genuine limit 
limino ür: 


Definition 15.6.2 (Formal definition of complex numbers). A 
complez number is any pair of the form (a,b), where a,b are real 
“numbers, thus for instance (2,4) is a complex number. Two com- 
plex numbers (a,b), (c,d) are said to be equal iff a = c and b = d, 
thus for instance (2 + 1,3 + 4) = (3,7), but (2,1) Æ (1,2) and 
(2,4) A (2, —4). The set of all complex numbers is denoted C. 


At this stage the complex numbers C are indistinguishable 
from the Cartesian product R? = R xR (also known as the Carte- 
sian plane). However, we will introduce a number of operations 
on the complex numbers, notably that of complex multiplication, 
which are not normally placed on the Cartesian plane R?. Thus 
one can think of the complex number system C as the Cartesian 


496 15. Power series 


plane R? equipped with a number of additional structures. We be. 
gin with the notion of addition and negation. Using the informa] 
definition of the complex numbers, we expect 


(a,b) +(c,d) = (a+bi)+(c+di) = (a+c)+(b+d)i = (a+c,b+d) 
and similarly 
—(a,b) = —(a + bi) = (—a) + (—b)i = (—a, —b). 


As these derivations used the informal definition of the complex 
numbers, these identities have not yet been rigourously proven, 
However we shall simply encode these identities into our complex 
number system by defining the notion of addition and negation 
by the above rules: 


Definition 15.6.3 (Complex addition, negation, and zero). If 
z = (a,b) and w = (c,d) are two complex numbers, we define 
their sum z + w to be the complex number z+ w := (a+c,b+d). 
Thus for instance (2,4) + (3,—1) = (5,3). We also define the 
negation —z of z to be the complex number —z := (—a, —b), thus 
for instance —(3, —1) = (—3, 1). We also define the complex zero 
0c to be the complex number 0¢ = (0,0). 


It is easy to see that notion of addition is well-defined in the 
sense that if z = z’ and w = w then z +w = z’+w’. Similarly for 
negation. The complex addition, negation, and zero operations 
obey the usual laws of arithmetic: 


Lemma 15.6.4 (The complex numbers are an additive group). 
If 21, 22,23 are complex numbers, then we have the commutative 
property zı +29 = z2 + z1, the associative property (zı +22) + z3 = 
zı + (z2 + z3), the identity property zı + 0¢ = 0c + 21 = z1, and 
the inverse property zı + (—21) = (—z1) + z1 = 0C- 


Proof. See Exercise 15.6.1. o 


Next, we define the notion of complex multiplication and recip- 
rocal. The informal justification of the complex multiplication rule 


15.6. A digression on complex numbers 497 


(a,b) - (c,d) = (a + bi)(c + di) 
= ac + adi + bic + bidi 
= (ac — bd) + (ad + bc)i 
= (ac — bd, ad + bc) 


since į? is supposed to equal —1. Thus we define 


Definition 15.6.5 (Complex multiplication). If z = (a,b) and 
w = (c,d) are complex numbers, then we define their product zw 
to be the complex number zw := (ac — bd,ad + bc). We also 
introduce the complex identity 1@ := (1,0). 


This operation is easily seen to be well-defined, and also obeys 
the usual laws of arithmetic: 


Lemma 15.6.6. If z1, zo, z3 are complex numbers, then we have 
the commutative property z1z2 = z221, the associative property 
(z122)z3 = 21(2223), the identity property alo = low = “1, 
and the distributivity properties zı(z2 + z3) = 2122 + 2123 and 
(zo + 23)2Z1 = 2221 + 2321. 


Proof. See Exercise 15.6.2. 0O 


The above lemma can also be stated more succinctly, as the 
assertion that C is a commutative ring. As is usual, we now write 
z — w as shorthand for z + (—w). 

We now identify the real numbers R with a subset of the com- 
plex numbers C by identifying any real number x with the com- 
plex number (z, 0), thus x = (x,0). Note that this identification 
is consistent with equality (thus x = y iff (x,0) = (y,0)), with 
addition (xı +22 = “3 iff (41,0) + (x2, 0) = (x3, 0)), with negation 
(z = —y iff (x,0) = —(y,0)), and multiplication (1122 = z3 iff 
(x1,0)(£2,0) = (23,0)), so we will no longer need to distinguish 
between “real addition” and “complex addition” , and similarly for 
equality, negation, and multiplication. For instance, we can com- 
pute 3(2,4) by identifying the real number 3 with the complex 


498 15. Power series 


number (3,0) and then computing (3,0)(2, 4) = (8x 2—0x4,3~x 
4+0 x 2) = (6,12). Note also that 0 = 0ç and 1 = 1C, SO we 
can now drop the C subscripts from the zero 0 and the identity 1, 

We now define i to be the complex number i := (0,1). We can 
now reconstruct the informal definition of the complex numbers 
as a lemma: 


Lemma 15.6.7. Every complex number z E C can be written as 
z=a+tbi for exactly one pair a,b of real numbers. Also, we have 
i? = —1, and —z = (—1)z. 


Proof. See Exercise 15.6.3. 0 


Because of this lemma, we will now refer to complex numbers 
in the more usual notation a+ bi, and discard the formal notation 
(a, b) henceforth. 


Definition 15.6.8 (Real and imaginary parts). If z is a complex 
number with the representation z = a + bi for some real numbers 
a,b, we shall call a the real part of z and denote R(z) := a, and call 
b the imaginary part of z and denote S(z) := b, thus for instance 
R(3 + 4i) = 3 and S(3+ 4i) = 4, and in general z = R(z) + iS(z). 
Note that z is real iff S(z) = 0. We say that z is imaginary 
iff R(z) = 0, thus for instance 4i is imaginary, while 3 + 4i is 
neither real nor imaginary, and 0 is both real and imaginary. We 
define the complex conjugate Z of z to be the complex number 
Z := R(z) — iS(z), thus for instance 3 + 4i = 3 — 4i, i = —i, and 
3:= 3, 


The operation of complex conjugation has several nice prop- 
erties: 


Lemma 15.6.9 (Complex conjugation is an involution). Let z,w 
be complex numbers, then z + w = Z+W, —z = —Z, and ZW = Z W. 
Also Z = z. Finally, we have Z = W if and only if z = w, and 
Z = z tf and only if z is real. 


Proof. See Exercise 15.6.4. 0 


15.6. A digression on complex numbers 499 


The notion of absolute value |x| was defined for rational num- 
pers x in Definition 4.3.1, and this definition extends to real num- 
pers in the obvious manner. However, we cannot extend this defi- 
nition directly to the complex numbers, as most complex numbers 
are neither positive nor negative. (For instance, we do not classify 
į as either a positive or negative number; see Exercise 15.6.15 for 
some reasons why). However, we can still define absolute value by 
generalizing the formula |x| = Vx? from Exercise 5.6.3: 


Definition 15.6.10 (Complex absolute value). If z = a + bi is a 
complex number, we define the absolute value |z| of z to be the 


real number |z| := Va? + b? = (a? + b?)!/2. 


From Exercise 5.6.3 we see that this notion of absolute value 
generalizes the notion of real absolute value. The absolute value 
has a number of other good properties: 


Lemma 15.6.11 (Properties of complex absolute value). Let z, w 
be complex numbers. Then |z| is a non-negative real number, and 
z| = 0 if and only if z = 0. Also we have the identity zz = |zl2, 
and so |z| = vV zZ. As a consequence we have |zw| = |z||w| and 
[Z| = |z|. Finally, we have the inequalities 


-|z| < R(z) < lz; —lz| < S(z) < lal; [al < IR(2)| + [S(2)| 
as well as the triangle inequality |z + w| < |z| + |w]. 
Proof. See Exercise 15.6.6. 0O 


Using the notion of absolute value, we can define a notion of 
reciprocal: 


Definition 15.6.12 (Complex reciprocal). If z is a non-zero com- 
plex number, we define the reciprocal z7} of z to be the complex 
number z7! := |z|~?Z (note that |z|~? is well-defined as a pos- 
itive real number because |z| is positive real, thanks to Lemma 
15.6.11). Thus for instance (1 + 21)! = |1 + 2i|-*(1 — 2i) = 
(1? + 27)-1(1 — 2i) = 3 — ži, If z is zero, z = 0, we leave the 
reciprocal 07} undefined. 


500 15. Power series 


From the definition and Lemma 15.6.11, we see that 


1 1 


zz! = zl = |e ?zz = |z|-|2|? = 1, 


and so z7} is indeed the reciprocal of z. We can thus define ąa 


notion of quotient z/w for any two complex numbers z,w with 
w #0 in the usual manner by the formula z/w := zw7!. 
The complex numbers can be given a distance by defining 


d(z,w) = |z — wl. 


Lemma 15.6.13. The complex numbers C with the distance qd 
form a metric space. If (zn)?2, is a sequence of complex numbers, 
and z is another complex number, then we have liMn—=oo Zn = 
z in this metric space if and only if limp. R(zn) = R(z) and 
limn—oo S(Zn) = S(z). 


Proof. See Exercise 15.6.9. oO 


This metric space is in fact complete and connected, but not 
compact: see Exercises 15.6.10, 15.6.12, 15.6.13. We also have the 
usual limit laws: 


Lemma 15.6.14 (Complex limit laws). Let (zn)?2., and (wn), 
be convergent sequences of complex numbers, and let c be a complex 
number. Then the sequences (Zn +Wn)p21, (2Zn—Wn) er, (cZn), 
(znwn)% 1, and (Zn); are also convergent, with 
lim Zn +Wn= lim zn + lim wn 
n co n—Cco n— CoO 
lim Zn — Wn = lim z,— lim wy, 
n—co nm—Co n— CO 
lim cZn = c lim Zn 
n— 00 n—co 
lim ZnWn = (lim z,)( lim wy) 
n—oco n—co nm— CoO 
lim Zn = lim Zn 
n—co n—-oo 
Also, if the wn are all non-zero and limp. 9 Wn is also non-zero, 
then (Zn/Wn)?2, is also a convergent sequence, with 


oe Ee ee 


15.6. A digression on complex numbers 501 


Proof. See Exercise 15.6.14. o 


Observe that the real and complex number systems are in fact 
quite similar; they both obey similar laws of arithmetic, and they 
have similar structure as metric spaces. Indeed many of the results 
in this textbook that were proven for real-valued functions, are 
also valid for complex-valued functions, simply by replacing “real” 
with “complex” in the proofs but otherwise leaving all the other 
details of the proof unchanged. Alternatively, one can always 
split a complex-valued function f into real and imaginary parts 
R(f), S(f), thus f = R(f) + iS(f), and then deduce results for 
the complex-valued function f from the corresponding results for 
the real-valued functions R(f),S(f). For instance, the theory 
of pointwise and uniform convergence from Chapter 14, or the 
theory of power series from this chapter, extends without any 
difficulty to complex-valued functions. In particular, we can define 
the complex exponential function in exactly the same manner as 
for real numbers: 


Definition 15.6.15 (Complex exponential). If z is a complex 
number, we define the function exp(z) by the formula 


OO 


exp(z) := D Z. 


n=0 


One can state and prove the ratio test for complex series and 
use it to show that exp(z) converges for every z. It turns out 
that many of the properties from Theorem 15.5.2 still hold: we 
have that exp(z + w) = exp(z) exp(w), for instance; see Exercise 
15.6.16. (The other properties require complex differentiation and 
complex integration, but these topics are beyond the scope of this 
text.) Another useful observation is that exp(z) = exp(Z); this 
can be seen by conjugating the partial sums -o = and taking 
limits as N — oo. 

The complex logarithm turns out to be somewhat more subtle, 
mainly because exp is no longer invertible, and also because the 
various power series for the logarithm only have a finite radius of 


502 15. Power series 


convergence (unlike exp, which has an infinite radius of convey. 
gence). This rather delicate issue is beyond the scope of this text 
and will not be discussed here. 


Exercise 15.6.1. Prove Lemma 15.6.4. 
Exercise 15.6.2. Prove Lemma 15.6.6. 
Exercise 15.6.3. Prove Lemma 15.6.7. 
Exercise 15.6.4. Prove Lemma 15.6.9. 


Exercise 15.6.5. If z is a complex number, show that R(z) = 242 and 


S(z) = 4. 


Exercise 15.6.6. Prove Lemma 15.6.6. (Hint: to prove the triangle 
inequality, first prove that R(zw) < |z||w|, and hence (from Exercise 
15.6.5) that z0 + Zw < 2\z||w|. Then add |z|? + |w|? to both sides of 
this inequality. ) 

Exercise 15.6.7. Show that if z,w are complex numbers with w Æ 0, 
then |z/w| = |z|/|w]. 

Exercise 15.6.8. Let z,w be non-zero complex numbers. Show that 


|z+w| = |z|+|w| if and only if there exists a positive real number c > 0 
such that z = cw. 


Exercise 15.6.9. Prove Lemma 15.6.13. 


Exercise 15.6.10. Show that the complex numbers C (with the usual 
metric d) form a complete metric space. 


Exercise 15.6.11. Let f : R? — C be the map f(a,b) := a+ bi. Show 
that f is a bijection, and that f and f—! are both continuous maps. 


Exercise 15.6.12. Show that the complex numbers C (with the usual 
metric d) form a connected metric space. (Hint: first show that C is 
path connected, as in Exercise 13.4.7.) 


Exercise 15.6.13. Let E be a subset of C. Show that E is compact if and 
only if E is closed and bounded. (Hint: combine Exercise 15.6.11 with 
the Heine-Borel theorem, Theorem 12.5.7.) In particular, show that C 
is not compact. 


Exercise 15.6.14. Prove Lemma 15.6.14. (Hint: split zn and w,, into 
real and imaginary parts and use the usual limit laws, Lemma 6.1.19, 
combined with Lemma 15.6.13.) 


15.7. Trigonometric functions 503 


Exercise 15.6.15. The purpose of this exercise is to explain why we 
do not try to organize the complex numbers into positive and negative 

arts. Suppose that there was a notion of a “positive complex number” 
and a “negative complex number” which obeyed the following reasonable 
axioms (cf. Proposition 4.2.9): 


e (Trichotomy) For every complex number z, exactly one of the 
following statements is true: z is positive, z is negative, z is zero. 


e (Negation) If z is a positive complex number, then —z is negative. 
If z is a negative complex number, then —z is positive. 


e (Additivity) If z and w are positive complex numbers, then z + w 
is also positive. 


e (Multiplicativity) If z and w are positive complex numbers, then 
zw is also positive. 


Show that these four axioms are inconsistent, i.e., one can use these 
axioms to deduce a contradiction. (Hints: first use the axioms to deduce 
that 1 is positive, and then conclude that —1 is negative. Then apply 
the Trichotomy axiom to z = 7 and obtain a contradiction in any one of 
the three cases). 

Exercise 15.6.16. Prove the ratio test for complex series, and use it to 
show that the series used to define the complex exponential is absolutely 
convergent. Then prove that exp(z+w) = exp(z) exp(w) for all complex 
numbers z, w. 


15.7 Trigonometric functions 


We now discuss the next most important class of special func- 
tions, after the exponential and logarithmic functions, namely the 
trigonometric functions. (There are several other useful special 
functions in mathematics, such as the hyperbolic trigonometric 
functions and hypergeometric functions, the gamma and zeta func- 
tions, and elliptic functions, but they occur more rarely and will 
not be discussed here.) 

Trigonometric functions are often defined using geometric con- 
cepts, notably those of circles, triangles, and angles. However, it 
is also possible to define them using more analytic concepts, and 
in particular the (complex) exponential function. 


504 15. Power series 


Definition 15.7.1 (Trigonometric functions). If z is a complex 
number, then we define 


e7 + e 
cos(z) := 5 

and l iz _ p—tz 
sin(z) := F 


We refer to cos and sin as the cosine and sine functions respec- 
tively. 


These formulae were discovered by Leonhard Euler (1707- 
1783) in 1748, who recognized the link between the complex expo- 
nential and the trigonometric functions. Note that since we have 
defined the sine and cosine for complex numbers z, we automati- 
cally have defined them also for real numbers x. In fact in most 
applications one is only interested in the trigonometric functions 
when applied to real numbers. 

From the power series definition of exp, we have 


z2 se 4 


iz __ PE re Leal ee 
el =] 1 Zz z at at 
and 
m2 a3 4 
a er | zy a dll a = 
e =] — 2z ETT n 
and so from the above formulae we have 
2 4 oO n 2n 
2n 2 (—1)"z 
cos(z) =1-—-—+—-...= ——_—_~__ 
l | l 
2! 4! <= (2n)! 
and : pea 
z ERS n+ 
sinfe) = 2-4 5. “Sa (2n +1)! ` 
In particular, cos(x) and sin(x) are always real whenever z is real. 


(— 1)” x?” 


From the ratio test we see that the two power series } po Tonyr 


15.7. Trigonometric functions 505 
59 Ga ( ee T. are absolutely convergent for every x, thus sin(x) 
and cos(x) are real analytic at 0 with an infinite radius of conver- 
gence. From Exercise 15.2.8 we thus see that the sine and cosine 
functions are real analytic on all of R. (They are also complex 
analytic on all of C, but we will not pursue this matter in this 
text). In particular the sine and cosine functions are continuous 
and differentiable. 

We list some basic properties of the sine and cosine functions 
below. 


Theorem 15.7.2 (Trigonometric identities). Let x, y be real num- 
bers. 


(a) We have sin(x)? + cos(x)? = 1. In particular, we have 
sin(x) € [—1, 1] and cos(x) € [—-1,1] for allx E R. 


(b) We have sin'(x) = cos(x) and cos'(x) = — sin(x). 
(c) We have sin(—x) = — sin(x) and cos(—z) = cos(x). 


(d) We have cos(z+-y) = cos(x) cos(y)—sin(x) sin(y) and sin(z+ 
y) = sin(x) cos(y) + cos(x) sin(y). 


(e) We have sin(0) = 0 and cos(0) = 1. 


(f) We have e** = cos(x) +isin(x) and e~* = cos(x) — i sin(x). 
In particular cos(x) = R(e*”) and sin(x) = S(e**). 


Proof. See Exercise 15.7.1. oO 
Now we describe some other properties of sin and cos. 


Lemma 15.7.3. There erists a positive number x such that sin(x) 
is equal to Q. 


Proof. Suppose for sake of contradiction that sin(x) Æ 0 for all 
z € (0,00). Observe that this would also imply that cos(x) # 
0 for all x € (0,00), since if cos(x) = 0 then sin(2z) = 0 by 
Theorem 15.7.2(d) (why?). Since cos(0) = 1, this implies by the 
intermediate value theorem (Theorem 9.7.1) that cos(x) > 0 for all 


506 15. Power series 


x > 0 (why?). Also, since sin(0) = 0 and sin’(0) = 1 > 0, we see 
that sin increasing near 0, hence is positive to the right of 0. By 
the intermediate value theorem again we conclude that sin(x) > Q 
for all x > 0 (otherwise sin would have a zero on (0, co)). 

In particular if we define the cotangent function cot(z) := 
cos(x)/ sin(x), then cot(x) would be positive and differentiable on 
all of (0,co). From the quotient rule (Theorem 10.1.13(h)) and 
Theorem 15.7.2 we see that the derivative of cot(x) is —1/ sin(x)? 
(why?) In particular, we have cot'(x) < —1 for all z > 0. By 
the fundamental theorem of calculus (Theorem 11.9.1) this implies 
that cot(z+s) < cot(z)—s for all x > 0 and s > 0. But letting s = 
oo we see that this contradicts our assertion that cot is positive 
on (0,00) (why?). o 


Let E be the set E := {x € (0, +00) : sin(x) = 0}, i.e., E is the 
set of roots of sin on (0, +00). By Lemma 15.7.3, E is non-empty, 
Since sin’(0) > 0, there exists a c > 0 such that E C [c, +00) (see 
Exercise 15.7.2). Also, since sin is continuous in [c, +00), E is 
closed in [c, +00) (why? use Theorem 13.1.5(d)). Since [c, +00) is 
closed in R, we conclude that E is closed in R. Thus E contains 
all its adherent points, and thus contains inf (E). Thus if we make 
the definition 


Definition 15.7.4. We define 7 to be the number 
m := inf{xz € (0, co) : sin(x) = 0} 


then we have m € E C [c,+00) (so in particular 7 > 0) and 
sin(7) = 0. By definition of 7, sin cannot have any zeroes in 
(0,7), and so in particular must be positive on (0,7), (cf. the ar- 
guments in Lemma 15.7.3 using the intermediate value theorem). 
Since cos’(x) = — sin(x), we thus conclude that cos(x) is strictly 
decreasing on (0,7). Since cos(0) = 1, this implies in particular 
that cos(7) < 1; since sin?(7) + cos? (m) = 1 and sin() = 0, we 
thus conclude that cos(7) = —1. 

In particular we have Euler’s famous formula 


e™ = cos(z) + isin(z) = —1. 


15.7. Trigonometric functions 507 


We now conclude with some other properties of sine and cosine. 


Theorem 15.7.5 (Periodicity of trigonometric functions). Let x 
be a real number. 


(a) We have cos(x + 7) = — cos(x) and sin(x + 7) = —sin(z). 
In particular we have cos(z+27) = cos(x) and sin(x+27) = 
sin(x), i.e., sin and cos are periodic with period 2r. 


(b) We have sin(x) = 0 if and only if x/r is an integer. 


(c) We have cos(x) = 0 if and only if x/r is an integer plus 


1/2. 
Proof. See Exercise 15.7.3. o 


We can of course define all the other trigonometric functions: 
tangent, cotangent, secant, and cosecant, and develop all the fa- 
miliar identities of trigonometry; some examples of this are given 
in the exercises. 


Exercise 15.7.1. Prove Theorem 15.7.2. (Hint: write everything in terms 
of exponentials whenever possible.) 


Exercise 15.7.2. Let f : R — R be a function which is differentiable at 
zo, with f(zo) = 0 and f'(£o) #0. Show that there exists a c > 0 such 
that f(y) is non-zero whenever 0 < |x — y| < c. Conclude in particular 
that there exists a c > 0 such that sin(x) Æ 0 for all 0 < z < c. 


Ezercise 15.7.3. Prove Theorem 15.7.5. (Hint: for (c), you may wish 
to first compute sin(7/2) and cos(7/2), and then link cos(x) to sin(x + 
m /2).) 

Exercise 15.7.4. Let x,y be real numbers such that z? +y? = 1. Show 
that there is exactly one real number 6 € (—7,7] such that x = sin(6) 
and y = cos(@). (Hint: you may need to divide into cases depending on 
whether x,y are positive, negative, or zero.) 


Exercise 15.7.5. Show that if r,s > 0 are positive real numbers, and 0, a 
are real numbers such that re’? = se**, then r = s and 0 = a+ 2rk for 
some integer k. 


508 15. Power series 


Exercise 15.7.6. Let z be a non-zero complex real number. Using Exer- 
cise 15.7.4, show that there is exactly one pair of real numbers r, @ such 
that r > 0, 0 € (—a,z], and z = ret. (This is sometimes known as the 
standard polar representation of z.) 


Exercise 15.7.7. For any real number 9 and integer n, prove the de 
Moivre identities 


cos(n@) = R((cos 8 + isin 0)”); sin(n#) = S((cos@ + isin @)”). 


Exercise 15.7.8. Let tan : (—7/2,7/2) — R be the tangent function 
tan(x) := sin(x)/ cos(x). Show that tan is differentiable and monotone 
increasing, with -Æ tan(x) = 1+ tan(x)’, and that lim,_,,/2tan(z) = 
+oo and lim, _,,/2tan(z) = —oo. Conclude that tan is in fact a 
bijection from (—7/2,7/2) — R, and thus has an inverse function 
tan-! : R — (—7/2,7/2) (this function is called the arctangent func- 
tion). Show that tan! is differentiable and 4 tan™i (z) = gr- 


Exercise 15.7.9. Recall the arctangent function tan`! from Exercise 
15.7.8. By modifying the proof of Theorem 15.5.6(e), establish the iden- 
tity 


for all z € (—1,1). Using Abel’s theorem (Theorem 15.3.1) to extend 
this identity to the case x = 1, conclude in particular the identity 


4 4 4 =. (—1)" 
n=0 
(Note that the series converges by the alternating series test, Proposition 
7.2.12). Conclude in particular that 4 — $ <m <4. (One can of course 
compute 7 = 3.1415926...to much higher accuracy, though if one wishes 
to do so it is advisable to use a different formula than the one above, 
which converges very slowly). 


Exercise 15.7.10. Let f : R — R be the function 
(o e] 

f(a) = ` 4~” cos(32" 72). 
n=1 


(a) Show that this series is uniformly convergent, and that f is con- 
tinuous. 


15.7. Trigonometric functions 509 


(b) 


(c) 


(d) 


Show that for every integer j and every integer m > 1, we have 


Trl 4-™ 
SRECNE 


(Hint: use the identity 


co m—1 CO 
Sian =(>> An) + am + ` an 


n=1 n=1 n=m+1 


for certain sequences an. Also, use the fact that the cosine func- 
tion is periodic with penoa 27, as well as the geometric series 
formula } `> or” = ņŁ- for any |r| < 1. Finally, you will need the 
inequality ora — cos(y)| < |x — y| for any real numbers x and 
y; this can be proven by using the mean value theorem (Corol- 
lary 10.2.9), or the fundamental theorem of calculus (Theorem 
11.9.4).) 


Using (b), show that for every real number zo, the function f is 
not differentiable at zo. (Hint: for every zo and every m > 1, 
there exists an integer j such that j < 3229 < j + 1, thanks to 
Exercise 5.4.3.) 


Explain briefly why the result in (c) does not contradict Corollary 
14.7.3. 


Chapter 16 


Fourier series 


In the previous two chapters, we discussed the issue of how cer- 
tain functions (for instance, compactly supported continuous func- 
tions) could be approximated by polynomials. Later, we showed 
how a different class of functions (real analytic functions) could 
be written exactly (not approximately) as an infinite polynomial, 
or more precisely a power series. 

Power series are already immensely useful, especially when 
dealing with special functions such as the exponential and trigono- 
metric functions discussed earlier. However, there are some cir- 
cumstances where power series are not so useful, because one has 
to deal with functions (e.g., Vx) which are not real analytic, and 
so do not have power series. 

Fortunately, there is another type of series expansion, known 
as Fourier series, which is also a very powerful tool in analysis 
(though used for slightly different purposes). Instead of analyzing 
compactly supported functions, it instead analyzes periodic func- 
tions; instead of decomposing into polynomials, it decomposes 
into trigonometric polynomials. Roughly speaking, the theory of 
Fourier series asserts that just about every periodic function can 
be decomposed as an (infinite) sum of sines and cosines. 


Remark 16.0.6. Jean-Baptiste Fourier (1768-1830) was, among 
other things, the governor of Egypt during the reign of Napoleon. 
After the Napoleonic wars, he returned to mathematics. He intro- 
duced Fourier series in an important 1807 paper in which he used 


16.1. Periodic functions 511 


them to solve what is now known as the heat equation. At the 
time, the claim that every periodic function could be expressed as 
a sum of sines and cosines was extremely controversial, even such 
leading mathematicians as Euler declared that it was impossible. 
Nevertheless, Fourier managed to show that this was indeed the 
case, although the proof was not completely rigourous and was 
not totally accepted for almost another hundred years. 


There will be some similarities between the theory of Fourier 
geries and that of power series, but there are also some major dif- 
ferences. For instance, the convergence of Fourier series is usually 
not uniform (i.e., not in the L metric), but instead we have con- 
vergence in a different metric, the L*-metric. Also, we will need 
to use complex numbers heavily in our theory, while they played 
only a tangential rdle in power series. 

The theory of Fourier series (and of related topics such as 
Fourier integrals and the Laplace transform) is vast, and deserves 
an entire course in itself. It has many, many applications, most 
directly to differential equations, signal processing, electrical en- 
gineering, physics, and analysis, but also to algebra and number 
theory. We will only give the barest bones of the theory here, 
however, and almost no applications. 


16.1 Periodic functions 


The theory of Fourier series has to do with the analysis of periodic 
functions, which we now define. It turns out to be convenient to 
work with complex-valued functions rather than real-valued ones. 


Definition 16.1.1. Let L > 0 be a real number. A function 
f :R — C is periodic with period L, or L-periodic, if we have 
f(x + L) = f(x) for every real number z. 


Example 16.1.2. The real-valued functions f(z) = sin(x) and 
f(x) = cos(x) are 27-periodic, as is the complex-valued function 
f(x) = e*. These functions are also 42-periodic, 67-periodic, etc. 
(why?). The function f(x) = x, however, is not periodic. The 
constant function f(z) = 1 is L-periodic for every L. 


512 16. Fourier serie, 


Remark 16.1.3. If a function f is L-periodic, then we haye 
f(z + kL) = f(a) for every integer k (why? Use induction for 
the positive k, and then use a substitution to convert the posi- 
tive k result to a negative k result. The k = 0 case is of course 
trivial). In particular, if a function f is 1-periodic, then we have 
f(x+k) = f(x) for every k € Z. Because of this, 1-periodic func. 
tions are sometimes also called Z-periodic (and L-periodic func- 
tions called LZ-periodic). 


Example 16.1.4. For any integer n, the functions cos(27nz), 
sin(2rnz), and e27*"* are all Z-periodic. (What happens when n 
is not an integer?). Another example of a Z-periodic function is 
the function f : R — C defined by f(z) :=1 le x € [n,n +4) 
for some integer n, and f(z) := 0 when x € [n+ }, n+ 1) for some 
integer n. This function is an example of a square wave. 


Henceforth, for simplicity, we shall only deal with functions 
which are Z-periodic (for the Fourier theory of L-periodic func- 
tions, see Exercise 16.5.6). Note that in order to completely spec- 
ify a Z-periodic function f : R — C, one only needs to specify its 
values on the interval [0,1), since this will determine the values 
of f everywhere else. This is because every real number x can be 
written in the form x = k + y where k is an integer (called the 
integer part of x, and sometimes denoted [z]) and y € [0, 1) (this 
is called the fractional part of x, and sometimes denoted {z}); 
see Exercise 16.1.1. Because of this, sometimes when we wish to 
describe a Z-periodic function f we just describe what it does on 
the interval [0, 1), and then say that it is extended periodically to 
all of R. This means that we define f(x) for any real number x 
by setting f(x) := f(y), where we have decomposed z = k + y as 
discussed above. (One can in fact replace the interval [0, 1) by any 
other half-open interval of length 1, but we will not do so here.) 

The space of complex-valued continuous Z-periodic functions 
is denoted C(R/Z;C). (The notation R/Z comes from algebra, 
and denotes the quotient group of the additive group R by the 
additive group Z; more information in this can be found in any 
algebra text.) By “continuous” we mean continuous at all points 


16.1. Periodic functions 513 


on R; merely being continuous on an interval such as [0,1] will 
not suffice, as there may be a discontinuity between the left and 
right limits at 1 (or at any other integer). Thus for instance, 
the functions sin(27nz), cos(2mnz), and e*™*"* are all elements 
of C(R/Z; C), as are the constant functions, however the square 
wave function described earlier is not in C(R/Z; C) because it is 
not continuous. Also the function sin(z) would also not qualify to 
be in C(R/Z; C) since it is not Z-periodic. 


Lemma 16.1.5 (Basic properties of C(R/Z; C)). 


(a) (Boundedness) If f € C(R/Z;C), then f is bounded (i.e., 
there exists a real number M > 0 such that |f(x)| < M for 
all z € R). 


(b) (Vector space and algebra properties) If f,g € C(R/Z; C), 
then the functions f+g, f—g, and fg are also in C(R./Z; C). 


Also, ifc is any complex number, then the function cf is also 
in C(R/Z;C). 


(c) (Closure under uniform limits) If (fn); is a sequence of 
functions in C(R/Z;C) which converges uniformly to an- 
other function f : R — C, then f is also in C(R/Z; C). 


Proof. See Exercise 16.1.2. O 


One can make C'(R/Z; C) into a metric space by re-introducing 
the now familiar sup-norm metric 


doo(f,9) = R If (x) — 9(x)| = ae |f (£) — g(2)| 


rE 


of uniform convergence. (Why is the first supremum the same as 
the second?) See Exercise 16.1.3. 


Exercise 16.1.1. Show that every real number z can be written in exactly 
one way in the form z = k+-y, where k is an integer and y E [0, 1). (Hint: 
to prove existence of such a representation, set k := sup{l E€ Z : 1 < z}.) 


514 16. Fourier series 


Ezercise 16.1.2. Prove Lemma 16.1.5. (Hint: for (a), first show that f 
is bounded on (0, 1].) 


Exercise 16.1.3. Show that C(R/Z;C) with the sup-norm metric de is 
a metric space. Furthermore, show that this metric space is complete. 


16.2 Inner products on periodic functions 


From Lemma 16.1.5 we know that we can add, subtract, multiply, 
and take limits of continuous periodic functions. We will need a 
couple more operations on the space C(R/Z;C), though. The 
first one is that of inner product. 


Definition 16.2.1 (Inner product). If f,g € C(R/Z;C), we de- 
fine the inner product (f,g) to be the quantity 


(f,9) = hn f(x)g(x) dz. 


Remark 16.2.2. In order to integrate a complex-valued func- 
tion, f(x) = g(x) + ih(x), we use the definition that Ji ab] f = 
fi aI +t Sia,b} h; i.e., we integrate the real and imaginary parts 
of the function separately. Thus for instance Jangl + ix) dx = 
Sng 1 dr +i Sng x dz = 1+ Ži. It is easy to verify that all the 
standard rules of calculus (integration by parts, fundamental the- 
orem of calculus, substitution, etc.) still hold when the functions 
are complex-valued instead of real-valued. 


Example 16.2.3. Let f be the constant function f(x) := 1, and 


16.2. Inner products on periodic functions 515 
let g(x) be the function g(x) := e™**. Then we have 


(f,9) = j lez dx 
[0,1] 


= f e` ris dr 
[0,1] 


Remark 16.2.4. In general, the inner product (f,g) will be a 


complex number. (Note that f(x)g(x) will be Riemann integrable 
since both functions are bounded and continuous.) 


Roughly speaking, the inner product (f,g) is to the space 
C(R/Z; C) what the dot product z-y is to Euclidean spaces such 
as R”. We list some basic properties of the inner product below; 
a more in-depth study of inner products on vector spaces can be 
found in any linear algebra text but is beyond the scope of this 
text. 


Lemma 16.2.5. Let f,g,h € C(R/Z;C). 


(a) (Hermitian property) We have (g, f) = (f,g). 


(b) (Positivity) We have (f,f) > 0. Furthermore, we have 
(f, f) =0 if and only if f =0 (i.e., f(x) =0 forall x E€ R). 


(c) (Linearity in the first variable) We have (f+g, h) = (f, h) + 
(g, h). For any complex number c, we have (cf, g9) = c(f,g). 


(d) (Antilinearity in the second variable) We have (f, g +h) = 
(f,g9)+(f,h). For any complex number c, we have (f, cg) = 


c(f,g). 


516 16. Fourier series 


Proof. See Exercise 16.2.1. o 


From the positivity property, it makes sense to define the L2 
norm || f\lo of a function f € C(R/Z; C) by the formula 


Ifle= VA = ( h IOT a h, [FEP da)", 


Thus ||f||2 > 0 for all f. The norm ||f||2 is sometimes called the 
root mean square of f. 


Example 16.2.6. If f(z) is the function e277, then 
Wf ll2 = a e2Mit o miT dx)!” = qi 1 dx)!/2 = 11/2 Si 
[0,1] [0,1] 


This L? norm is related to, but is distinct from, the L norm 
IIfllo := suP er |f (x)|. For instance, if f(x) = sin(x), then 
Iflloo = 1 but |I| fll2 = Jz In general, the best one can say is 


that 0 < |Ifll2 < || flloo; see Exercise 16.2.3. 
Some basic properties of the L? norm are given below. 


Lemma 16.2.7. Let f,g € C(R/Z;C). 
(a) (Non-degeneracy) We have ||f||o = 0 if and only if f = 0. 
(b) (Cauchy-Schwarz inequality) We have |(f,9)| < ||fllallglle. 
(c) (Triangle inequality) We have ||f + gll2 < |lfll2 + llglle. 


(d) (Pythagoras’ theorem) If (f,g) = 0, then lf + gll = IfI + 
lligllż. 


(e) (Homogeneity) We have ||cfll2 = lell fll2 for alle € C. 
Proof. See Exercise 16.2.4. 0O 


In light of Pythagoras’ theorem, we sometimes say that f and 
g are orthogonal iff (f, g) = 0. 
We can now define the L? metric dzz on C(R/Z; C) by defining 


dra(f,9) := If —glle = ( h IŒ) = g0) da)? 


16.2. Inner products on periodic functions 517 


Remark 16.2.8. One can verify that d,2 is indeed a metric (Ex- 
ercise 16.2.2). Indeed, the L? metric is very similar to the l? metric 
on Euclidean spaces R”, which is why the notation is deliberately 
chosen to be similar; you should compare the two metrics yourself 
to see the analogy. 


Note that a sequence fn of functions in C(R/Z;C) will con- 
verge in the L? metric to f € C(R/Z;C) if dz2(fn, f) > 0 as 
n — œ, or in other words that 


lim l |fn(£) — f (£)? dx = 0. 
„1 


n00 (0 


Remark 16.2.9. The notion of convergence in L? metric is dif- 
ferent from that of uniform or pointwise convergence; see Exercise 
16.2.6. 


Remark 16.2.10. The L? metric is not as well-behaved as the 
L® metric. For instance, it turns out the space C(R/Z;C) is 
not complete in the L2 metric, despite being complete in the L 
metric; see Exercise 16.2.5. 


Exercise 16.2.1. Prove Lemma 16.2.5. (Hint: the last part of (b) is a 
little tricky. You may need to prove by contradiction, assuming that f is 
not the zero function, and then show that Jion |f (x) |? is strictly positive. 
You will need to use the fact that f, and hence |f|, is continuous, to do 
this.) 

Exercise 16.2.2. Prove that the L? metric dz2 on C(R/Z; C). does indeed 
turn C(R/Z; C) into a metric space. (cf. Exercise 12.1.6). 


Exercise 16.2.3. If f € C(R/Z;C) is a non-zero function, show that 
0< |lfllo < |lfllzo. Conversely, if 0 < A < B are real numbers, so 
that there exists a non-zero function f € C(R/Z;C) such that ||fllo = 
A and ||fllo = B. (Hint: let g be a non-constant non-negative real- 
valued function in C(R/Z;C), and consider functions f of the form 
f = (c + dg)!/? for some constant real numbers c,d > 0.) 


Ezercise 16.2.4. Prove Lemma 16.2.7. (Hint: use Lemma 16.2.5 fre- 
quently. For the Cauchy-Schwarz inequality, begin with the positivity 


518 16. Fourier series 


property (f, f) > 0, but with f replaced by the function f||g||2 —(f, 9)9, 
and then simplify using Lemma 16.2.5. You may have to treat the case 
lgll2 = 0 separately. Use the Cauchy-Schwarz inequality to prove the 
triangle inequality. ) 
Exercise 16.2.5. Find a sequence of continuous periodic functions which 
converge in L? to a discontinuous periodic function. (Hint: try converg- 
ing to the square wave function.) 
Exercise 16.2.6. Let f € C(R/Z, C), and let (fn); be a sequence of 
functions in C(R/Z; C). 

(a) Show that if f, converges uniformly to f, then f, also converges 

to f in the L? metric. 


(b) Give an example where fn converges to f in the L? metric, but 
does not converge to f uniformly. (Hint: take f = 0. Try to make 
the functions f, large in sup norm.) 


(c) Give an example where f, converges to f in the L? metric, but 
does not converge to f pointwise. (Hint: take f = 0. Try to make 
the functions fn large at one point.) 

(d) Give an example where fn converges to f pointwise, but does not 


converge to f in the L? metric. (Hint: take f = 0. Try to make 
the functions f, large in L* norm.) 


16.3 Trigonometric polynomials 


We now define the concept of a trigonometric polynomial. Just 
as polynomials are combinations of the functions z” (sometimes 
called monomials), trigonometric polynomials are combinations of 
the functions e?"*"* (sometimes called characters). 


Definition 16.3.1 (Characters). For every integer n, we let en € 
C(R/Z; C) denote the function 


n(x) := e”, 


This is sometimes referred to as the character with frequency n. 
Definition 16.3.2 (Trigonometric polynomials). A function f 
in C(R/Z; C) is said to be a trigonometric polynomial if we can 


write f = yo N Cn€n for some integer N > 0 and some complex 
numbers (Cn) __y- 


16.3. Trigonometric polynomials 519 


Example 16.3.3. The function f = 4e_»9 + te—ı — 2e9 + De; — 3e2 
is a trigonometric polynomial; it can be written more explicitly as 


f(z) = 4e—4mz ae je 2™t et) ae 3e4miz 


Example 16.3.4. For any integer n, the function cos(27nz) is a 
trigonometric polynomial, since 
27inz —2rinz 
e +e 1 1 
cos(27nz) = E aia cea + aon: 

Similarly the function sin(27nzx) = Stent Len is a trigonometric 
polynomial. In fact, any linear combination of sines and cosines 
is also a trigonometric polynomial, for instance 3 + icos(27xz) + 
4isin(472) is a trigonometric polynomial. 


The Fourier theorem will allow us to write any function in 
C(R/Z; C) as a Fourier series, which is to trigonometric polyno- 
mials what power series is to polynomials. To do this we will use 
the inner product structure from the previous section. The key 
computation is 


Lemma 16.3.5 (Characters are an orthonormal system). For 
any integers n and m, we have (€n,e€m) = 1 when n = m and 
(en,em) = 0 when n #m. Also, we have |len|| = 1. 


Proof. See Exercise 16.3.2. O 


As a consequence, we have a formula for the co-efficients of a 
trigonometric polynomial. 


Corollary 16.3.6. Let f = Dan N Cn€n be a trigonometric poly- 
nomial. Then we have the formula 


Cn = (fren) 


for all integers -N < n < N. Also, we have 0 = (f,en) whenever 
n>N orn < -—N. Also, we have the identity 


N 
Ifl = $ len’. 


n=—N 


520 16. Fourier series 


Proof. See Exercise 16.3.3. o 
We rewrite the conclusion of this corollary in a different way. 


Definition 16.3.7 (Fourier transform). For any function f € 
C(R/Z;R), and any integer n € Z, we define the nt? Fourier 
coefficient of f, denoted f(n), by the formula 


f(n) = (f,en) = if f(a)en27** dp. 


The function f : Z — C is called the Fourier transform of f. 


From Corollary 16.3.6, we see that whenever f = aN N Cn€n 
is a trigonometric polynomial, we have 


N fore) 


f= ` (f,enjen = >. (f,€n)en 


n=—N n=— 0 


and in particular we have the Fourier inversion formula 


f= `> f(n)jen 


n=— co 


or in other words 


f(a) = > fine. 


n=—co 


The right-hand side is referred to as the Fourier series of f. Also, 
from the second identity of Corollary 16.3.6 we have the Plancherel 
formula 


Iflz= do Ifo. 


nm=—co 


Remark 16.3.8. We stress that at present we have only proven 
the Fourier inversion and Plancherel formulae in the case when 
f is a trigonometric polynomial. Note that in this case that the 
Fourier coefficients f (n) are mostly zero (indeed, they can only 


16.4. Periodic convolutions 521 


be non-zero when —N < n < N), and so this infinite sum is 
really just a finite sum in disguise. In particular there are no 
issues about what sense the above series converge in; they both 
converge pointwise, uniformly, and in L? metric, since they are 
just finite sums. 


In the next few sections we will extend the Fourier inversion 
and Plancherel formulae to general functions in C(R/Z;C), not 
just trigonometric polynomials. (It is also possible to extend the 
formula to discontinuous functions such as the square wave, but we 
will not do so here). To do this we will need a version of the Weier- 
strass approximation theorem, this time requiring that a continu- 
ous periodic function be approximated uniformly by trigonometric 
polynomials. Just as convolutions were used in the proof of the 
polynomial Weierstrass approximation theorem, we will also need 
a notion of convolution tailored for periodic functions. 


Exercise 16.3.1. Show that the sum or product of any two trigonometric 
polynomials is again a trigonometric polynomial. 


Exercise 16.3.2. Prove Lemma 16.3.5. 


Exercise 16.3.3. Prove Corollary 16.3.6. (Hint: use Lemma 16.3.5. For 
the second identity, either use Pythagoras’ theorem and induction, or 
substitute f = ae N Cnén and expand everything out.) 


16.4 Periodic convolutions 


The goal of this section is to prove the Weierstrass approximation 
theorem for trigonometric polynomials: 


Theorem 16.4.1. Let f € C(R/Z;C), and lete > 0. Then there 
exists a trignometric polynomial P such that || f — Pllo < €. 


This theorem asserts that any continuous periodic function 
can be uniformly approximated by trigonometric polynomials. To 
put it another way, if we let P(R/Z;C) denote the space of all 
trigonometric polynomials, then the closure of P(R/Z;C) in the 
L™ metric is C(R/Z; C). 


522 16. Fourier series 


It is possible to prove this theorem directly from the Weier- 
strass approximation theorem for polynomials (Theorem 14.8.3) 
and both theorems are a special case of a much more genera] 
theorem known as the Stone- Weierstrass theorem, which we wil] 
not discuss here. However we shall instead prove this theorem 
from scratch, in order to introduce a couple of interesting notions, 
notably that of periodic convolution. The proof here, though, 
should strongly remind you of the arguments used to prove The- 
orem 14.8.3. 


Definition 16.4.2 (Periodic convolution). Let f,g € C(R/Z;C). 
Then we define the periodic convolution f*g:R— C of f and g 
by the formula 


f * g(x) := h i f(y)g(z — y) dy. 


Remark 16.4.3. Note that this formula is slightly different from 
the convolution for compactly supported functions defined in De- 
finition 14.8.9, because we are only integrating over [0,1] and not 
on all of R. Thus, in principle we have given the symbol f * g two 
conflicting meanings. However, in practice there will be no confu- 
sion, because it is not possible for a non-zero function to both be 
periodic and compactly supported (Exercise 16.4.1). 


Lemma 16.4.4 (Basic properties of periodic convolution). Let 
f,g,h € C(R/Z; C). 


(a) (Closure) The convolution fxg is continuous and Z-periodic. 
In other words, f » g € C(R/Z;C). 


(b) (Commutativity) We have f *«g= 9 * f. 


(c) (Bilinearity) We have fx(gt+h)=fx*xgt+fx*h and (f+ 
g)*h=fxh+gx*h. For any complex number c, we have 


c(f * g) = (cf) * g = f * (cg). 
Proof. See Exercise 16.4.2. 0O 


16.4. Periodic convolutions 523 


Now we observe an interesting identity: for any f € C(R/Z; C) 
and any integer n, we have 


f *en = f(n)en. 


To prove this, we compute 


Faas J OC 
[0,1] 


= e2ting f(ye"2™ry dy z f(n)e2™n= = f(n)en 
[0,1] 
as desired. 
More generally, we see from Lemma 16.4.4(iii) that for any 
; 7 r _ n—N 
trigonometric polynomial P = ` p—_y Cn€n, we have 


n=N n=N 
f*P= D> calf*en)= >> fln)cnen. 
n=—N n=—N 


Thus the periodic convolution of any function in C(R/Z;C) with 
a trigonometric polynomial, is again a trigonometric polynomial. 
(Compare with Lemma 14.8.13.) 

Next, we introduce the periodic analogue of an approximation 
to the identity. 


Definition 16.4.5 (Periodic approximation to the identity). Let 
e>0Oand0 < < 1/2. A function f € C(R/Z;C) is said to 
be a periodic (€,6) approximation to the identity if the following 
properties are true: 


(a) f(z) 2 0 for all z € R, and fi. f =1. 
(b) We have f(z) < e for all ô < |z| < 1 — ô. 


Now we have an analogue of Lemma 14.8.8: 


Lemma 16.4.6. For every € > 0 and0 < 6 < 1/2, there exists 
a trigonometric polynomial P which is an (€,6) approximation to 
the identity. 


524 16. Fourier series 


Proof. We sketch the proof of this lemma here, and leave the 
completion of it to Exercise 16.4.3. Let N > 1 be an integer. We 
define the Fejér kernel Fy to be the function 


N 
n 
Fy = Y a- Men. 
n=—N 


Clearly Fy is a trigonometric polynomial. We observe the identity 
1 N-1 
Fy == ) enl? 
N N.| ar. n| 


(why?). But from the geometric series formula (Lemma 7.3.3) we 
have 

en—e9 e™(N-D 2 sin(a Nz) 
y en(z) = Se ee 

e1 —e€9 sin(72) 


when z is not an integer, (why?) and hence we have the formula 


_ sin(rNzx)? 
ENE anne) 


When z is an integer, the geometric series formula does not apply, 
but one has Fy(xz) = N in that case, as one can see by direct 
computation. In either case we see that Fy(x) > 0 for any z. 
Also, we have 


N 


In] 0| 
Fyn(x) dx = 1 — en = (1 — lsi 
hon n (z) 2 | vy) ʻA l-7) 


(why?). Finally, since sin(r Nz) < 1, we have 


1 1 


oi ee 
F(z) < N sin(raz)? — N sin(rô)? 


whenever 6 < |z| < 1 — ô (this is because sin is increasing on 
[0,7 /2] and decreasing on [r/2,7]). Thus by choosing N large 
enough, we can make F(z) < e for all 6 < |z| < 1 — ô. o 


16.4. Periodic convolutions 525 


Proof of Theorem 16.4.1. Let f be any element of C(R/Z;C); we 
know that f is bounded, so that we have some M > 0 such that 
He <M forallzeR. 

Let £ > 0 be arbitrary. Since f is uniformly continuous, there 
exists a ô > 0 such that |f(x) — f(y)| < € whenever |x — y| < ô. 
Now use Lemma 16.4.6 to find a trigonometric polynomial P which 
is a (€,6) approximation to the identity. Then f * P is also a 
trigonometric polynomial. We now estimate ||f — f * Plloo. 

Let x be any real number. We have 


\f(z) — f * P(z)| = |f (x) — P * f (x)| 
= f(z) - | f(z —y)P(y) dyl 
[0,1] 


=| I, f(2)P(y) dy — a f(a —y)P(y) dy 
= | | (f(x) — f(x — y))P(y) dy| 


0,1] 
< fl) —fe-vIPw dy 
The right-hand side can be split as 


J fle) — f(£ — y)| P(Y) dy + J f(a) — f(e—y)|Ply) dy 
[0,4] [5,1—6] 


+f IFE) -FE -wIPW) dy 
[1—8,1] 


which we can bound from above by 


< | eP(y) ay + | 2Me dy 
[0,4] [6,1—4] 


+f fl@-1)- fa- IPU) ay 
[1—5,1] 


< f eP(y) dy +f 2Me ay + | eP(y) dy 
[0,5] (5,1—4] [1—ô,1] 


<eEeE+2Me +e 
= (2M + 2)e. 


526 16. Fourier series 


Thus we have ||f — f x Pllo < (2M + 2)e. Since M is fixed and 
€ is arbitrary, we can thus make f x P arbitrarily close to f in 
sup norm, which proves the periodic Weierstrass approximation 


theorem. oO 


Exercise 16.4.1. Show that if f : R — C is both compactly supported 
and Z-periodic, then it is identically zero. 


Exercise 16.4.2. Prove Lemma 16.4.4. (Hint: to prove that f * g is 
continuous, you will have to do something like use the fact that f is 
bounded, and g is uniformly continuous, or vice versa. To prove that 
f *g = gx f, you will need to use the periodicity to “cut and paste” the 
interval (0, 1].) 

Exercise 16.4.3. Fill in the gaps marked (why?) in Lemma 16.4.6. (Hint: 
for the first identity, use the identities |z|? = zZ, En = e-n, and ene, = 
Enam) 


16.5 The Fourier and Plancherel theorems 


Using the Weierstrass approximation theorem (Theorem 16.4.1), 
we can now generalize the Fourier and Plancherel identities to 
arbitrary continuous periodic functions. 


Theorem 16.5.1 (Fourier theorem). For any f € C(R/Z;C), 
the series X >__ ~ f(n)en converges in L? metric to f. In other 
words, we have 


N 
im If- X finere =0. 


n=—N 


Proof. Let £ > 0. We have to show that there exists an No such 
that || f — ae vf (n)en|lo < € for all sufficiently large N. 

By the Weierstrass approximation theorem (Theorem 16.4.1), 
we can find a trigonometric polynomial P = eae No Cn€n Such 
that |f — Pllo < £, for some No > 0. In particular we have 


If — Plle < €. 


16.5. The Fourier and Plancherel theorems 527 


Now let N > No, and let Fy := ey N f(n)en. We claim 
that ||f — Fn|lo < £. First observe that for any |m| < N, we have 


(f — FN, €m) = (f, €m) — > f(n)(€ns€m) = f(m) — f(m) =0, 


n=—N 


where we have used Lemma 16.3.5. In particular we have 
(f —Fv,Fn — P) =0 


since we can write Fy — P as a linear combination of the €m for 
which |m| < N. By Pythagoras’ theorem we therefore have 


If- PI? = If — Frvll3 + llFw — P| 
and in particular 


If — Fle < lf — Plle < € 
as desired. g 


Remark 16.5.2. Note that we have only obtained convergence 
of the Fourier series °~._. f(n)en to f in the L? metric. One 
may ask whether one has convergence in the uniform or pointwise 
sense as well, but it turns out (perhaps somewhat surprisingly) 
that the answer is no to both of those questions. However, if 
one assumes that the function f is not only continuous, but is 
also continuously differentiable, then one can recover pointwise 
convergence; if one assumes continuously twice differentiable, then 
one gets uniform convergence as well. These results are beyond 
the scope of this text and will not be proven here. However, 
we will prove one theorem about when one can improve the L? 
convergence to uniform convergence: 


Theorem 16.5.3. Let f € C(R/Z;C), and suppose that the 


series ns |f(n)| is absolutely convergent. Then the series 


OQO 


oes f(n)en converges uniformly to f. In other words, we have 


N 
dim If- D> fieno =0 


n=—N 


528 16. Fourter series 


Proof. By the Weierstrass M-test (Theorem 14.5.7), we see that 

ee f (njen converges to some function F, which by Lemma 
16.1.5(iii) is also continuous and Z-periodic. (Strictly speaking, 
the Weierstrass M test was phrased for series from n = 1 to 
n = oo, but also works for series from n = —oo to n = +oo; 
this can be seen by splitting the doubly infinite series into two 


pieces.) Thus 
me a 
„Jim ||F — 2 f(n)enlloo = 0 
which implies that 
ee 
Jim |F- 2 F(n)enll2 =0 


since the L? norm is always less than or equal to the L® norm. But 
the sequence 3 N f (n)en is already converging in L? metric to 
f by the Fourier theorem, so can only converge in L? metric to F 
if F = f (cf. Proposition 12.1.20). Thus F = f, and so we have 


N 
Jim If- D> Finel =0 
n=—N 
as desired. go 


As a corollary of the Fourier theorem, we obtain 


Theorem 16.5.4 (Plancherel theorem). For any f € C(R/Z;C), 


the series $° __ŢȚ |Î(n)|? is absolutely convergent, and 
2 A 
I= do Ifo. 
n=— 00 


This theorem is also known as Parseval’s theorem. 


16.5. The Fourier and Plancherel theorems 529 


Proof. Let £ > 0. By the Fourier theorem we know that 


N 
lf — ` f(n)en|l2 <E 


n=—N 


if N is large enough (depending on £). In particular, by the tri- 
angle inequality this implies that 


N 
lfle- e<l do f(menlle < IIflle + e- 


n=—N 
On the other hand, by Corollary 5 we have 
N . N - 
| do fenile=(S5 IAP 
n=—N n=—N 
and hence 
N A 
(Ifllz—€)? < $ IÊ? < (fle + €)”. 
n=—N 


Taking lim sup, we obtain 


N 
(\Ifll2 — £)* < lim Sup > IÊ)? < (fle + e)’. 
>O n=—N 


Since € is arbitrary, we thus obtain by the squeeze test that 
N 
lim sup $, |f)? = Ifl 
N= ZEN 
and the claim follows. O 


There are many other properties of the Fourier transform, but 
we will not develop them here. In the exercises you will see a small 
number of applications of the Fourier and Plancherel theorems. 


530 16. Fourier series 


Exercise 16.5.1. Let f be a function in C(R/Z;C), and define the 
trigonometric Fourier coefficients an, bn for n = 0,1, 2,3,... by 


A= 2 f(x) cos(2amnx) dz; bn := 2 f(x) sin(27nz) dz. 
(0,1) (0,1) 


(a) Show that the series 


1 oo 

320 + S (an cos(27nzx) + bn sin(27nz)) 
' n=1 

converges in L? metric to f. (Hint: use the Fourier theorem, and 

break up the exponentials into sines and cosines. Combine the 


positive n terms with the negative n terms.) 


(b) Show that if }>°°., a, and ` p—ıbn are absolutely convergent, 
then the above series actually converges uniformly to f, and not 
just in L? metric. (Hint: use Theorem 16.5.3). 


Exercise 16.5.2. Let f(x) be the function defined by f(x) = (1 — 22)? 
when «x € [0,1), and extended to be Z-periodic for the rest of the real 
line. 


(a) Using Exercise 16.5.1, show that the series 


1 4 
3 F ` 72,2 cos(27nz) 
n=1 
converges uniformly to f. 
(b) Conclude that Xp; Æ = z. (Hint: evaluate the above series at 


xz = 0.) 


(c) Conclude that 77", + = cai (Hint: expand the cosines in terms 
of exponentials, and use Plancherel’s theorem.) 


Exercise 16.5.3. If f € C(R/Z; C) and P is a trigonometric polynomial, 
show that Porm j Popop 
f * P(n) = f(n)en = f(n)P(n) 


for all integers n. More generally, if f,g € C(R/Z; C), show that 
f *g(n) = f(n)â(n) 


for all integers n. (A fancy way of saying this is that the Fourier trans- 
form intertwines convolution and multiplication). 


16.5. The Fourier and Plancherel theorems 531 


Exercise 16.5.4. Let f € C(R/Z; C) bea function which is differentiable, 
and whose derivative f’ is also continuous. Show that f’ also lies in 
C(R/Z; C), and that f’(n) = in f(n) for all integers n. 


Exercise 16.5.5. Let f,g E€ C(R/Z;C). Prove the Parseval identity 


af FEIE) de =R Y` finit). 
neZ 


(Hint: apply the Plancherel theorem to f + g and f — g, and subtract 
the two.) Then conclude that the real parts can be removed, thus 


l fæ) dz = F` fini 


ne 
(Hint: apply the first identity with f replaced by if.) 


Exercise 16.5.6. In this exercise we shall develop the theory of Fourier 
series for functions of any fixed period L. 
Let L > 0, and let f : R — C be a complex-valued function which 
is continuous and L-periodic. Define the numbers cn for every integer n 
by 
Ch == f(ax)e—27*r2/L dz. 
L Jio,r) 


(a) Show that the series 
co 
3 Cn e2tinz/ L 
n=— 00 
converges in L? metric to f. More precisely, show that 


N 
: = 2ring/L\2 — 
im, | f(x) X Cne |“ dx = 0. 


(Hint: apply the Fourier theorem to the function f(Dz).) 


(b) If the series }-™ |en| is absolutely convergent, show that 


ore) 
X Cn ezrinz/L 


m= — CO 


m= —_ CO 


converges uniformly to f. 


532 16. Fourier series 


(c) Show that 
1 co 
IJ, FOP a= D len. 


n=— CO 


(Hint: apply the Plancherel theorem to the function f(Dz).) 


Chapter 17 


Several variable differential calculus 


17.1 Linear transformations 


We shall now switch to a different topic, namely that of differ- 
entiation in several variable calculus. More precisely, we shall be 
dealing with maps f : R” — R” from one Euclidean space to 
another, and trying to understand what the derivative of such a 
map is. 

Before we do so, however, we need to recall some notions from 
linear algebra, most importantly that of a linear transformation 
and a matrix. We shall be rather brief here; a more thorough 
treatment of this material can be found in any linear algebra text. 


Definition 17.1.1 (Row vectors). Let n > 1 be an integer. We 
refer to elements of R” as n-dimensional row vectors. A typical 
n-dimensional row vector may take the form z = (21, 22,...,2n), 
which we abbreviate as (zi)i<i<n; the quantities £1, £2,..., Tn 
are of course real numbers. If (2i)i<i<n and (yi)i<i<n are n- 
dimensional row vectors, we can define their vector sum by 


(Zi)i<icn + (Yi)isisn = (Zi + Yidi<i<n, 


and also if c € R is any scalar, we can define the scalar product 
(zi )i<i<n by 

C(24)1<icn = (€%i)1<i<n- 
Of course one has similar operations on R™ as well. However, if 
n Æ m, then we do not define any operation of vector addition 


534 17. Several variable differential calculys 


between vectors in R” and vectors in R” (e.g., (2,3, 4) + (5,6) is 
undefined). We also refer to the vector (0,...,0) in R” as the zero 
vector and also denote it by 0. (Strictly speaking, we should denote 
the zero vector of R” by Opr», as they are technically distinct from 
each other and from the number zero, but we shall not take care 
to make this distinction). We abbreviate (—1)z as —z. 


The operations of vector addition and scalar multiplication 
obey a number of basic properties: 


Lemma 17.1.2 (R” is a vector space). Let x,y,z be vectors in 
R”, and let c,d be real numbers. Then we have the commuta- 
tivity property x + y = y + x, the additive associativity property 
(c+y)+z=2+(y4+2z), the additive identity property z +0 = 
0+z =a, the additive inverse property z+ (—xz) = (-z) +2 =0, 
the multiplicative associativity property (cd)x = c(dx), the distrib- 
utivity properties c(x+y) = cx + cy and (c + d)x = cx + dz, and 
the multiplicative identity property lx = z. 


Proof. See Exercise 17.1.1. oO 


Definition 17.1.3 (Transpose). If (2i)1<i<n = (£1, 22,..-,2n) is 
an n-dimensional row vector, we can define its transpose (xi) 4<;<, 
by E 


Tı 

T2 
(£i)i<i<n = (Gi De tn)" = 

Ln 


We refer to objects such as (z;){<,<,, as n-dimensional column 
vectors. 


Remark 17.1.4. There is no functional difference between a row 
vector and a column vector (e.g., one can add and scalar multiply 
column vectors just as well as we can row vectors), however we 
shall (rather annoyingly) need to transpose our row vectors into 
column vectors in order to be consistent with the conventions of 
matrix multiplication, which we will see later. Note that we view 


17.1. Linear transformations 535 


row vectors and column vectors as residing in different spaces; 
thus for instance we will not define the sum of a row vector with a 
column vector, even when they have the same number of elements. 


Definition 17.1.5 (Standard basis row vectors). We identify n 
special vectors in R”, the standard basis row vectors €},...,€n. 
For each 1 < j < n, ej is the vector which has 0 in all entries 
except for the jt? entry, which is equal to 1. 


For instance, in R3, we have e; = (1,0,0), e2 = (0,1,0), and 
e3 = (0,0,1). Note that if x = (2;)1<;<n is a vector in R”, then 


n 
T = T1€1 + L262 +... F EnEn = $ tjej, 
j=1 


or in other words every vector in R” is a linear combination of 
the standard basis vectors e1,...,€n. (The notation } j- Tjej 
is unambiguous because the operation of vector addition is both 
commutative and associative). Of course, just as every row vector 
is a linear combination of standard basis row vectors, every column 
vector is a linear combination of standard basis column vectors: 


n 
a? = zye? F aoe, se eae Ine = X zje. 
j=1 


There are (many) other ways to create a basis for R”, but this 
is a topic for a linear algebra text and will not be discussed here. 


Definition 17.1.6 (Linear transformations). A linear transfor- 
mation T : R” — R” is any function from one Euclidean space 
R” to another R” which obeys the following two axioms: 


(a) (Additivity) For every 2,2’ € R”, we have T(x + 2’) = 
Tz +T. 


(b) (Homogeneity) For every x € R” and every c € R, we have 
T (cx) = cTz. 


536 17. Several variable differential calculus 


Example 17.1.7. The dilation operator T; : R? — R defined 
by Tiz := 5a (i.e., it dilates each vector x by a factor of 5) is a 
linear transformation, since 5(x + x’) = 5x + 5x’ for all 2,2’ € R3 
and 5(cx) = c(5z) for all z € RÌ and z E€ R. 


Example 17.1.8. The rotation operator To : R? — R? defined 
by a clockwise rotation by 7/2 radians around the origin (so that 
To(1,0) = (0,1), T2(0,1) = (—1,0), etc.) is a linear transforma. 
tion; this can best be seen geometrically rather than analytically. 


Example 17.1.9. The projection operator Tz : R3 — R? defined 
by 73(x,y,z) := (x,y) is a linear transformation (why?). The 
inclusion operator T4 : R? — R? defined by T,(z, y) := (z, y, 0) is 
also a linear transformation (why?). Finally, the identity operator 
In : R” — R”, defined for any n by Inx := =z is also a linear 
transformation (why?). 


As we shall shortly see, there is a connection between linear 
transformations and matrices. 


Definition 17.1.10 (Matrices). An m x n matriz is an object A 
of the form 


Q11 Q12 Qin 
Q21 Q22 Q2n 

A = e 9 
Am1 Am2 eee Amn 


we shall abbreviate this as 
A = (ij) 1<i<mj1<j<n- 


In particular, n-dimensional row vectors are 1 x n matrices, while 
n-dimensional column vectors are n x 1 matrices. 


Definition 17.1.11 (Matrix product). Given an m x n matrix A 
and an n x p matrix B, we can define the matriz product AB to 
be the m x p matrix defined as 


n 


(aij) 1<i<misj<n(bjk)i<j<njisksp = (Y aijbjk)i<i<m;1<k<p- 
j=l 


17.1. Linear transformations 537 


In particular, if 27 = CTR is an n-dimensional column vec- 
tor, and A = (Gis )1<i<ml<j<n is an m x n matrix, then Az? is an 
m-dimensional column vector: 


n 
Az’ = (>> QijTj i<i<m: 
j=1 


We now relate matrices to linear transformations. If A is an 
m x n matrix, we can define the transformation La : R” — R” 
by the formula 
(Laz)! := Ax’. 


Example 17.1.12. If A is the matrix 


1 2 3 
a=(a 56) 


and £z = (£1, £2,713) is a 3-dimensional row vector, then Lax is 
the 2-dimensional row vector defined by 


Ly 
1 2 3 T1 +2%2+ 3T 
f A _ 1 2 3 
(Laz) =(j 5 5) . a 


or in other words 
La(21, £2, £3) = (£1 + 222 + 323, 421 + 5£2 + 623). 


More generally, if 


Q11 Q12 Qin 

Q21 Q22 Q2n 
A= , 

Ami Am2 --- Amn 


then we have 


n 
La(2;)1sj<n = ()_ 4523) 1<i<m- 
j=l 


538 17. Several variable differential calculys 


For any m x n matrix A, the transformation L4 is automatically 
linear; one can easily verify that Da(z + y) = Laz + Lay and 
La(ex) = c(L,zx) for any n-dimensional row vectors x,y and any 
scalar c. (Why?) 


Perhaps surprisingly, the converse is also true, i.e., every linear 
transformation from R” to R” is given by a matrix: 


Lemma 17.1.13. Let T : R” — R” be a linear transformation, 
Then there exists exactly one m x n matriz A such that T = L4. 


Proof. Suppose T : R” — R” is a linear transformation. Let 
€1,€2,...,€n be the standard basis row vectors of R”. Then 
Tei, Te2,..., Ten are vectors in R”. For each 1 < j < n, we 
write Te; in co-ordinates as 


Te; = (a1;,@2j,.--, amj) = (@ij)i<i<m, 


i.e., we define a;; to be the 1t” component of Te;. Then for any 
n-dimensional row vector £ = (21,...,2%n), we have 


n 
Tx =T()) sjej), 
j=l 
which (since T is linear) is equal to 


a> T(x;€;) 


n 
= X (@ij£j)i<i<m 


= ($ dij; )1sigm: 
j=l 


17.1. Linear transformations 539 


But if we let A be the matrix 


Q11 Q12 Qin 

Q21 Q22 Q2n 
A= 

Am1 Am2 PETET Qmn 


then the previous vector is precisely Lax. Thus Tz = Laz for all 
n-dimensional vectors x, and thus T = L4. 

Now we show that A is unique, i.e., there does not exist any 
other matrix 


bi, bye bin 
pi b21 922 bon 
bmi bm2 secs Denes 


for which T is equal to Lg. Suppose for sake of contradiction that 
we could find such a matrix B which was different from A. Then 
we would have L, = Lg. In particular, we have Lae; = Le; for 
every 1 < j < n. But from the definition of L4 we see that 


Lae; = (Giz )1<i<m 


and 
Le; = (bij )isi<m 


and thus we have aij = bj; for every 1 <i<mand1<j <m, 
thus A and B are equal, a contradiction. O 


Remark 17.1.14. Lemma 17.1.13 establishes a one-to-one cor- 
respondence between linear transformations and matrices, and is 
one of the fundamental reasons why matrices are so important in 
linear algebra. One may ask then why we bother dealing with 
linear transformations at all, and why we don’t just work with 
matrices all the time. The reason is that sometimes one does not 
want to work with the standard basis €1,..., en, but instead wants 
to use some other basis. In that case, the correspondence between 


540 17. Several variable differential calculus 


linear transformations and matrices changes, and so it is still im- 
portant to keep the notions of linear transformation and matrix 
distinct. More discussion on this somewhat subtle issue can be 
found in any linear algebra text. 


Remark 17.1.15. If T = L4, then A is sometimes called the 
matrix representation of T, and is sometimes denoted A = [T]. 
We shall avoid this notation here, however. 


The composition of two linear transformations is again a linear 
transformation (Exercise 17.1.2). The next lemma shows that the 
operation of composing linear transformations is connected to that 
of matrix multiplication. 


Lemma 17.1.16. Let A be an m x n matriz, and let B be an 
n x p matriz. Then LaLp = Lap. 


Proof. See Exercise 17.1.3. g 


Exercise 17.1.1. Prove Lemma 17.1.2. 


Exercise 17.1.2. If T : R” — R” is a linear transformation, and S: 
R? — R” is a linear transformation, show that the composition TS : 
R? — R” of the two transforms, defined by TS(x) := T(S(z)), is also 
a linear transformation. (Hint: expand TS(x+y) and TS(cx) carefully, 
using plenty of parentheses.) 

Exercise 17.1.3. Prove Lemma 17.1.16. 

Exercise 17.1.4. Let T : R” — R” be a linear transformation. Show 
that there exists a number M > 0 such that ||Tz|| < M||2|| for all 
x E€ R”. (Hint: use Lemma 17.1.13 to write T in terms of a matrix 
A, and then set M to be the sum of the absolute values of all the 
entries in A. Use the triangle inequality often - it’s easier than messing 
around with square roots etc.) Conclude in particular that every linear 
transformation from R” to R” is continuous. 


17.2 Derivatives in several variable calculus 


Now that we’ve reviewed some linear algebra, we turn now to our 
main topic of this chapter, which is that of understanding differen- 
tiation of functions of the form f : R” — R”, i.e., functions from 


17.2. Derivatives in several variable calculus 541 


one Euclidean space to another. For instance, one might want to 
differentiate the function f : R3 — R* defined by 


f(x,y, z) = (xy, yz, £z, £yz). 


In single variable calculus, when one wants to differentiate a 
function f : E — R at a point xo, where E is a subset of R that 
contains xo, this is given by 


f'(x0) = lim f(z) aE f (zo) 


x—x9;rE E\ {z0} L— Xo 


One could try to mimic this definition in the several variable case 
f: E — R”, where E is now a subset of R”, however we en- 
counter a difficulty in this case: the quantity f(x) — f(zo) will live 
in R”, and x — zo lives in R”, and we do not know how to divide 
an m-dimensional vector by an n-dimensional vector. 

To get around this problem, we first rewrite the concept of 
derivative (in one dimension) in a way which does not involve 
division of vectors. Instead, we view differentiability at a point £o 
as an assertion that a function f is “approximately linear” near 
To. 


Lemma 17.2.1. Let E be a subset of R, f : E — R be a function, 
to E E, and L € R. Then the following two statements are 
equivalent. 


(a) f is differentiable at xo, and f'(£o) = L. 
(b) We have lim, ,29:r¢B—{20} [f(z)—(F(20)+L(z—=20))| _ Q, 


|z—zo| 


Proof. See Exercise 17.2.1. o 


In light of the above lemma, we see that the derivative f’ (zo) 
can be interpreted as the number L for which |f(x) — (f(£o) + 
L(x —29))| is small, in the sense that it tends to zero as z tends to 
Zo, even if we divide out by the very small number |x — zo|. More 
informally, the derivative is the quantity L such that we have the 
approximation f(z) — f(ao) + L(x — xo). 


542 17. Several variable differential calculus 


This does not seem too different from the usual notion of dif- 
ferentiation, but the point is that we are no longer explicitly di- 
viding by x — 2. (We are still dividing by |x — zol, but this will 
turn out to be OK). When we move to the several variable case 
f: E — R”, where E C R”, we shall still want the derivative to 
be some quantity L such that f(x) — f (zo) + L(x — xo). However, 
since f(z) — f(zo) is now an m-dimensional vector and x — 2 is 
an n-dimensional vector, we no longer want L to be a scalar; we 
want it to be a linear transformation. More precisely: 


Definition 17.2.2 (Differentiability). Let E be a subset of R”, 
f: E — R” bea function, zp € E be a point, and let L : R” — 
R” be a linear transformation. We say that f is differentiable at 
zo with derivative L if we have 


lim If (£) — (f (£0) + L(x — 20))]| 


=; 
x—zro;tEE-—{zx0} læ = zoll 


Here ||z|| is the length of x (as measured in the I? metric): 
(1, £2,- -< , £n)l| = (ai + 25+... +02)? 


Example 17.2.3. Let f : R? — R? be the map f(z,y) := 
(x*, y2), let zo be the point zp := (1,2), and let L : R? — R? 
be the map L(z, y) := (22, 4y). We claim that f is differentiable 
at Xo with derivative L. To see this, we compute 


i F(z, y) — (FC, 2) + LU, y) — (12) 
(x,y)(1,2):(2,y)#(1,2) II(z, y) — (1,2) 
Making the change of variables (x, y) = (1, 2)+(a, b), this becomes 
If + a, 2 + b) — (f(1, 2) + L(a, b))|| 
(a,b)—+(0,0):(a,b)#(0,0) (a,b) 
Substituting the formula for f and for L, this becomes 


i I((1 + a)?, (2 + b)?) — (1, 4) — (2a, 4b) )|| 
(a,b)—(0,0):(a,b)£(0,0) Iia, b) || i 


17.2. Derivatives in several variable calculus 543 


which simplifies to 


2 Bd 
tim (ać, b I 
(a,b)+(0,0):(a,b)#(0,0) ||(a,b)|| 


2 
We use the squeeze test. The expression | ne ll is clearly non- 
negative. On the other hand, we have by the triangle inequality 


(a, 8 )I] < 11(a?,0)|| + I0, 8?) = a? + 8? 


and hence 


2 p2 
Il(a*, b9) < SFe. 
Iie, b) ~ 
Since Va? + b? — 0 as (a,b) — 0, we thus see from the squeeze 
test that the above limit exists and is equal to 0. Thus f is 
differentiable at zo with derivative L. 


As you can see, verifying that a function is differentiable from 
first principles can be somewhat tedious. Later on we shall find 
better ways to verify differentiability, and to compute derivatives. 

Before we proceed further, we have to check a basic fact, which 
is that a function can have at most one derivative at any interior 
point of its domain: 


Lemma 17.2.4 (Uniqueness of derivatives). Let E be a subset of 
R”, f: E — R” be a function, xp E€ E be an interior point of 
E, and let Li : R” — R” and Ly: R” — R” be linear transfor- 
mations. Suppose that f is differentiable at xo with derivative Lı, 
and also differentiable at xo with derivative Lo. Then Li = Lo. 


Proof. See Exercise 17.2.2. O 


Because of Lemma 17.2.4, we can now talk about the derivative 
of f at interior points 29, and we will denote this derivative by 
f'(zo). Thus f’(29) is the unique linear transformation from R” 
to R™ such that 


son, Mle) (Flan) + Hale — a0) o 
L— 20; TÉTO læ ag zoll 


544 17. Several variable differential calculus 


Informally, this means that the derivative f’(2o) is the linear 
transformation such that we have 


f(x) — f(xo) © f'(xo)(z — 20) 
or equivalently 


f(x) ~ f (£0) + f"(xo)(# — zo) 


(this is known as Newton’s approximation, compare with Propo- 
sition 10.1.7). 

Another consequence of Lemma 17.2.4 is that if you know that 
f(x) = g(x) for all x € E, and f,g are differentiable at xo, then 
you also know that f’(xo) = g’(xo) at every interior point of E. 
However, this is not necessarily true if xp is a boundary point of E; 
for instance, if E is just a single point E = {xo}, merely knowing 
that f(z0) = g(zo) does not imply that f'(xo) = g’(zo). We 
will not deal with these boundary issues here, and only compute 
derivatives on the interior of the domain. 

We will sometimes refer to f’ as the total derivative of f, to 
distinguish this concept from that of partial and directional deriv- 
atives below. The total derivative f is also closely related to the 
derivative matrix Df, which we shall define in the next section. 


Exercise 17.2.1. Prove Lemma 17.2.1. 


Exercise 17.2.2. Prove Lemma 17.2.4. (Hint: prove by contradiction. If 
Lı Æ Lə, then there exists a vector v such that Liv 4 Lov; this vector 
must be non-zero (why?). Now apply the definition of derivative, and 
try to specialize to the case where x = x9 + tv for some scalar t, to 
obtain a contradiction.) 


17.3 Partial and directional derivatives 


We now connect the notion of differentiability with that of partial 
and directional derivatives, which we now introduce: 


Definition 17.3.1 (Directional derivative). Let E be a subset of 
R”, f: E — R” be a function, let zo be an interior point of E, 


17.3. Partial and directional derivatives 545 


and let v be a vector in R”. If the limit 


lim f(zo + tv) — f (zo) 
t—0;t>0,to+tvEE t 


exists, we say that f is differentiable in the direction v at Zo, and 
we denote the above limit by D, f (zo): 


Dafles) = , 1p E) A 


Remark 17.3.2. One should compare this definition with Defin- 
ition 17.2.2. Note that we are dividing by a scalar t, rather than 
a vector, so this definition makes sense, and D, f(x£o) will be a 
vector in R™. It is sometimes possible to also define directional 
derivatives on the boundary of E, if the vector v is pointing in an 
“inward” direction (this generalizes the notion of left derivatives 
and right derivatives from single variable calculus); but we will 
not pursue these matters here. 


Example 17.3.3. If f : R — R is a function, then Di, f(z) is 
the same as the right derivative of f(x) (if it exists), and similarly 
D—ı f(x) is the same as the left derivative of f(x) (if it exists). 


Example 17.3.4. We use the function f : R? — R° defined by 
f(x,y) := (x?, y?) from before, and let zo := (1,2) and v := (3, 4). 
as f(1 + 3t,2-+4 4t) — f(1,2) 
+ 3t,2+ 4¢) —j(1, 
Df (zo) = ema t 
2 (1 + 6t + 9t7,4 + 16t + 16t?) — (1,4) 
T £-40;t>0 t 
= lim (6+ 9t,16+ 16t) = (6,16). 
t—0;t>0 

Directional derivatives are connected with total derivatives as 

follows: 


Lemma 17.3.5. Let E be a subset of R”, f: E — R” bea 
function, xo be an interior point of E, and let v be a vector in 


546 17. Several variable differential calculus 


R”. If f is differentiable at xo, then f is also differentiable in the 
direction v at xo, and 


Dyf (Zo) = f'(xo)v. 
Proof. See Exercise 17.3.1. o 


Remark 17.3.6. One consequence of this lemma is that total 
differentiability implies directional differentiability. However, the 
converse is not true; see Exercise 17.3.3. 


Closely related to the concept of directional derivative is that 
of partial derivative: 


Definition 17.3.7 (Partial derivative). Let E be a subset of R”, 
let f : E — R” be a function, let zo be an interior point of E, 
and let 1 < j < m. Then the partial derivative of f with respect 
to the x; variable at zo, denoted z (z0), is defined by 


f(zo +te;)—f(zo) d 
t—0; ao tottveEeE t _ ad (tot tes) lt=0 


5p (00) = 


provided of course that the limit exists. (If the limit does not 
exist, we leave ae (20) undefined). 


Informally, the partial derivative can be obtained by holding 
all the variables other than z; fixed, and then applying the single- 
variable calculus derivative in the x; variable. Note that if f takes 
values in R™ , then so will gr Indeed, if we write f in components 
as f = (f1, ---, fm), it is easy to see (why?) that 


Fag (0) = (Ge (0) --- (C0) 
i.e., to differentiate a vector-valued function one just has to dif- 
ferentiate each of the components separately. 
We sometimes replace the variables z; in ae with other sym- 
bols. For instance, if we are dealing with the function f(z, y) = 
(x*,y*), then we might refer to ol and 3t instead of ae and 


17.8. Partial and directional derivatives 547 


, (In this case, gt f(x,y) = (22,0) and 2 SE (a ,y) = (0, 2y)). One 
Be caution however that one should baby relabel the variables 
if it is absolutely clear which symbol refers to the first variable, 
which symbol refers to the second variable, etc.; otherwise one 
may become unintentionally confused. For instance, in the above 
example, the expression SL (ar, x) is just (22,0), however one may 
mistakenly compute 


°F (2,2) = © (2,22) = (22,20); 


the problem here is that the symbol z is being used for more 
than just the first variable of f. (On the other hand, it is true 
that 4 f(x,x) is equal to (2z,2x); thus the operation of total 
differentiation 4 is not the same as that of partial differentiation 
2 

Bom Lemma, 17.3.5, we know that if a function is differentiable 
at a point Xo, then all the partial derivatives ae Í exist at zo, and 


that 
Fag (20) = F(z) 


Also, if v = (v1,..-.,Un) = Dy v;e;, then we have 
Dy f (20) = f'(20) > vjej = Dus f'(wo) ey 
j j 
(since f’(xo) is linear) and thus 
D uf ( £o) = Dds; (xo). 


Thus one can write directional derivatives in terms of partial deriv- 
atives, provided that the function is actually differentiable at that 
point. 

Just because the partial derivatives exist at a point £o, we 
cannot conclude that the function is differentiable there (Exercise 
17.3.3). However, if we know that the partial derivatives not only 
exist, but are continuous, then we can in fact conclude differen- 
tiability, thanks to the following handy theorem: 


548 17. Several variable differential calculus 


Theorem 17.3.8. Let E be a subset of R”, f : E > R™ beg 
function, F be a subset of E, and xo be an interior point of p 

i ae Of 
If all the partial derivatives 5-- exist on F and are continuous o 


zo, then f is differentiable at zo, and the linear transformation, 
f'(zo) : R” — R™ is defined by 


n Of 
/ = nd ae 
f (20) (Uj )1<j<n = 2 “Dp: (zo). 
Proof. Let L : R” — R” be the linear transformation 


L(vj)i<j<m aa Ta (xo). 


We have to prove that 


fos |f (£) — (f(z0) + L(z — 20))|| 


= (0. 
2x9 ;c€E—{zx9} |e — LO | 


Let € > 0. It will suffice to find a radius 6 > 0 such that 
|f (£) — (f (a0) + L(x — zo) < 


læ — zoll 


for all x € B(zo,ô)\ {£o}. Equivalently, we wish to show that 
[f (£) — f(£0) — L(x — zo)|| < ellz — zoll 


for all x € B(xo, 6)\{xo}. 

Because Zp is an interior point of F', there exists a ball B(zo,r) 
which is contained inside F'. Because each partial derivative 2 
is oe on F', there thus exists an 0 < 6; < r such that 
EAG x) — 5h ƏT (x9)|| < e/nm for every x € B(zo,ô;). If we take 
ô = min(dj,...,6,), then we thus have EA (x)— 5 (20)| < e/nm 
for every x E€ B(2to, ô) and every 1 <j <n. 

Let x € B(xp, 6). We write x = rp + viey + V2€2 +... + Unen 
for some scalars v1,...,Un. Note that 


|e — zoll = 4/uf +u +...4+ v2 


11.8. Partial and directional derivatives 549 


and in particular we have |v;| < || — zol| for all 1 < j < n. Our 
task is to show that 


Of 
\f(zo +9121 +.. . + Unen) — f (£o) — Doa (xo)l| < ella — xoll. 


Write f in components as f = (fi, fo,..., fm) (so each fi is a 
function from E to R). From the mean value theorem in the x 
yariable, we see that 


fi(xo + viei) — fi(xo) = H ao + tiei )vi 


for some t; between 0 and vı. But we have 


l a i 
Ph (ao + tes) — 5 (a0)] < (ÊE (a0 + ter) - 5% (a0) < e/m 
j Tj 
and hence 


|fi(£o + vie) — fil ahi (zojui| < elvil/nm. 


Summing this over all 1 < i < m (and noting that ||(y1,...,ym)|| < 
Iul + --- + |ym| from the triangle inequality) we obtain 


|f (£o + viei) — f (£o) — L (tojon < eļvıl/n; 


since |v1| < ||” — zo||, we thus have 


|F (0+ v1e1) — f(20) - ZE (zo) < elle — soll/n 


A similar argument gives 
Of 
|f (£o + vey + v2e2) — f (£o + v1e1) — gz, Ell < ella — zoll /n. 
and so forth up to 
|f (£0 + vier +... + Unen) — f (£o + vier +... + Un—1€n-1) 


0 
— 5  (ao)enll < elle — all /n. 


550 17. Several variable differential calculys 


If we sum these n inequalities and use the triangle inequality ||74 
y|| < ||z\| + lly||, we obtain a telescoping series which simplifies to 


as) 
|f (£o + vier +... + Unen) — f (#0) — 3 Fe, (woes < ellz — zoj 
J= 


as desired. D 


From Theorem 17.3.8 and Lemma 17.3.5 we see that if the 
partial derivatives of a function f : E — R” exist and are contin. 
uous on some set F', then all the directional derivatives also exist 
at every interior point xo of F, and we have the formula 


= O 
Dion, un) f (£0) = Sugo) 
j=l 


In particular, if f : E — R is a real-valued function, and we 
define the gradient Vf (zo) of f at zo to be the n-dimensional 
row vector V f(zo) := (2 (z0), . . , 2E (20)), then we have the 
familiar formula 

Dyf (xo) = v - V f (x0) 


whenever Zo is in the interior of the region where the gradient 
exists and is continuous. 

More generally, if f : E — R” is a function taking values in 
R”, with f = (f1,.--, fm), and zo is in the interior of the region 
where the partial derivatives of f exist and are continuous, then 
we have from Theorem 17.3.8 that 


=~. o 
f'(xo)(v;)i<j<n = Soge) 
j=l 


n 
Of; 
= (> vig TO) Jima: 
j=1 
which we can rewrite as 


LF (29) (Vj )1<j<n 


17.8. Partial and directional derivatives 551 
where Df (xo) is the m x n matrix 


DH a0) = (FE (a0) actemcicn 


Sfi (29) AG o) = AN 
Gin (9) dln (zo) ... fm (x0) 


Thus we have 
(Dof (xo))” = (f'(to)v)* = Df (xo)v". 


The matrix Df (zo) is sometimes also called the derivative ma- 
triz or differential matrix of f at xo, and is closely related to the 
total derivative f’(2z 9). One can also write Df as 


Of, wr Of Of vr 
(an, 0)" g (0) "Oa, a (20) ); 

i.e., each of the columns of Df (zo) is one of the partial derivatives 
of f, expressed as a column vector. Or one could write 


V fi(Zo) 
V f2(z0) 


Df (zo) = 


Df(zo)=] . 
V fm(Zo) 


i.e., the rows of Df (xo) are the gradient of various components of 
f. In particular, if f is scalar-valued (i.e., m = 1), then Df is the 
same as Vf. 


ee 17.3.9. Let f: R? — R? be the pa f(z,y) = 
(x? + £y, y2). Then 2 of = = (2x + y,0) and aL = (x, 2y). Since 
these partial derivatives are continuous on R*, we see that f is 
differentiable on all of R?, and 


Df (z,y) = ( ie A ). 


552 17. Several variable differential calculys 


Thus for instance, the directional derivative in the direction (v, w) 
iS 
Dw w) f (2; y) = ((2x + y)u + cw, 2yw). 


Exercise 17.3.1. Prove Lemma 17.3.5. (This will be similar to Exercise 
17.1.3). 


Exercise 17.3.2. Let E be a subset of R”, let f : E — R” be a func- 
tion, let xp be an interior point of E, and let 1 < j < n. Show that 
oa) exists if and only if De, f (£o) and D_e, f(zo) exist and are neg- 
atives of each other (thus De, f (xo) = D-e; f (£0)); furthermore, one has 
SZ (20) = De, f (xo) in this case. 


Exercise 17.3.3. Let f : R? — R be the function defined by f(x,y) := 
wi when (x,y) Æ (0,0), and f(0,0) := 0. Show that f is not differ- 
entiable at (0,0), despite being differentiable in every direction v € R? 
at (0,0). Explain why this does not contradict Theorem 17.3.8. 


Exercise 17.3.4. Let f : R” — R” be a differentiable function such 
that f'(x) = 0 for all  € R”. Show that f is constant. (Hint: you 
may use the mean-value theorem or fundamental theorem of calculus 
for one-dimensional functions, but bear in mind that there is no direct 
analogue of these theorems for several-variable functions. I would not 
advise proceeding via first principles.) For a tougher challenge, replace 
the domain R” by an open connected subset 2 of R”. 


17.4 The several variable calculus chain rule 


We are now ready to state the several variable calculus chain rule. 
Recall that if f : X — Y and g : Y — Z are two functions, then 
the composition g o f : X — Z is defined by go f(x) := g(f(x)) 
for all x € X. 


Theorem 17.4.1 (Several variable calculus chain rule). Let E be 
a subset of R”, and let F be a subset of R™. Let f: E — F be 
a function, and let g: F — R? be another function. Let xo be 
a point in the interior of E. Suppose that f is differentiable at 
xo, and that f (xo) is in the interior of F. Suppose also that g is 


17.4. The several variable calculus chain rule 553 


differentiable at f (xo). Then go f : E — R? is also differentiable 
at Zp, and we have the formula 


(g o f) (£0) = g'(f (xo) f’ (z0). 
Proof. See Exercise 17.4.3. g 


One should compare this theorem with the single-variable chain 
rule, Theorem 10.1.15; indeed one can easily deduce the single- 
variable rule as a consequence of the several-variable rule. 

Intuitively, one can think of the several variable chain rule 
as follows. Let x be close to zp. Then Newton’s approximation 
asserts that 


f(x) — f(to) = f'(x0) (x — zo) 


and in particular f(z) is close to f(xo). Since g is differentiable 
at f (zo), we see from Newton’s approximation again that 


9(f (x)) — 9(f(x0)) = 9'(F(20)) (F(z) — f (z0)). 


Combining the two, we obtain 


go f(z) — go f(z0) © g'(f(z0)) f (20) (2 — xo) 


which then should give (g o f)'(£o) = g'(f(£o))f' (zo). This ar- 
gument however is rather imprecise; to make it more precise one 
needs to manipulate limits rigourously; see Exercise 17.4.3. 

As a corollary of the chain rule and Lemma 17.1.16 (and 
Lemma 17.1.13), we see that 


D(g o f)(z0) = Dg(f(z0))D F (xo); 


i.e., we can write the chain rule in terms of matrices and matrix 
multiplication, instead of in terms of linear transformations and 
composition. 


Example 17.4.2. Let f : R” — R. and g : R” — R be differ- 
entiable functions. We form the combined function h : R” — R? 


554 17. Several variable differential calculus 


by defining h(x) := (f(z), 9(x)). Now let k : R? — R be the 
multiplication function k(a, b) := ab. Note that 


Dho) = ( Sates ) 


while 
Dk(a, b) = (b,a) 


(why?). By the chain rule, we thus see that 


D(koh)(x0) = (9(x0), f(£0)) ( i ) = 9(£0)V f (£0)+ f (£0) Vola, 


But koh = fg (why?), and D(fg) = V(fg). We have thus proven 
the product rule 


V(fg)=9Vf + fVg. 


A similar argument gives the sum rule V(f + g) = Vf + Va, 
or the difference rule V(f — g) = Vf — Vg, as well as the quotient 
rule (Exercise 17.4.4). As you can see, the several variable chain 
rule is quite powerful, and can be used to deduce many other rules 
of differentiation. 

We do record one further useful application of the chain rule. 
Let T : R” — R” be a linear transformation. From Exercise 
17.4.1 we observe that T is continuously differentiable at every 
point, and in fact T(x) = T for every x. (This equation may look 
a little strange, but perhaps it is easier to swallow if you view it 
in the form (Tz) = T). Thus, for any differentiable function 
f : E — R”, we see that Tf : E — R” is also differentiable, and 
hence by the chain rule 


(TF) (z0) = T(f (£0)). 


This is a generalization of the single-variable calculus rule (cf Y = 
c(f’) for constant scalars c. 

Another special case of the chain rule which is quite useful is 
the following: if f : R” — R” is some differentiable function, and 


17.5. Double derivatives and Clairaut’s theorem 555 


zj : R — R are differentiable functions for each j = 1,...n, then 
d 
TOEN, z2(t),.. Tn(t)) = T L Oa; 7 (t), £2(t \ pig tat). 


(Why is this a special case of the chain rule?). 


Exercise 17.4.1. Let T : R” — R” be a linear transformation. Show 
that T is continuously differentiable at every point, and in fact T' (x) = T 
for every x.:-What is DT? 
Exercise 17.4.2. Let E be a subset of R”. Prove that if a function 
f:E—R” is differentiable at an interior point ro of E, then it is also 
continuous at xo. (Hint: use Exercise 17.1.4.) 

Exercise 17.4.3. Prove Theorem 17.4.1. (Hint: you may wish to review 
the proof of the ordinary chain rule in single variable calculus, Theorem 
10.1.15. The easiest way to proceed is by using the sequence-based 
definition of limit (see Proposition 14.1.5(b)), and use Exercise 17.1.4.) 

Exercise 17.4.4. State and prove some version of the quotient rule for 
functions of several variables (i.e., functions of the form f : E — R for 
some subset E of R”). In other words, state a rule which gives a formula 
for the gradient of f /g; compare your answer with Theorem 10.1.13(h). 
Be sure to make clear what all your assumptions are. 

Exercise 17.4.5. Let Z : R — R? be a differentiable function, and let 
r:R—R be the function r(t) := ||Z(t)||, where ||Z|| denotes the length 
of Z as measured in the usual /? metric. Let tp be a real number. Show 
that if r(to) Æ 0, then r is differentiable at to, and 


T T' (to). Z(to) 


r' (to) = r (tp) 


(Hint: use Theorem 17.4.1.) 


17.5 Double derivatives and Clairaut’s theorem 


We now investigate what happens if one differentiates a function 
twice. 


Definition 17.5.1 (Twice continuous differentiability). Let E be 
an open subset of R”, and let f : E — R” be a function. We 


556 17. Several variable differential calculus 


say that f is continuously differentiable if the partial derivatives 
a asks 2L exist and are continuous on E. We say that f is twice 
continuously differentiable if it is continuously differentiable, and 
the partial derivatives 2E, P 2L are themselves continuously 


differentiable. 


Remark 17.5.2. Continuously differentiable functions are some- 
times called C! functions; twice continuously differentiable func- 
tions are sometimes called C? functions. One can also define C3, 
C4, etc. but we shall not do so here. 


Prample 17.5.3. Let f : R? — R? be the function f(z, y) = 
(£? + zy, y2). Then f is continuously aeren ane because the 
partial derivatives of (x,y) = (2a + 4,0) and 2 SL (z, y) = (a, 2y) 


exist and are continuous on all of R?. It is also sn onumu 
differentiable, because the double partial derivatives 5z 2L (x,y) = 


(2,0), £2 (e,y) = (1,0), £3 (2,9) = (1,0), £E (2,9) = (0,2) 
all exist and are continuous. 


Observe in the above example that the double derivatives & gf 
and oof are the same. This is a in fact a general phenomenon: 


Theorem 17.5.4 (Clairaut’s theorem). Let E be an open subset 


of R”, and let f : E — R be a twice continuously differentiable 


function on E. Then we have Be; sL (£o) = 2 t (20) for all 


1<, J <n. 


Proof. The claim is trivial if i = 7, so we shall assume that i Æ 7. 
We shall prove the theorem for Zp = 0; the general case is similar. 
(Actually, once one proves Clairaut’s theorem for zp = 0, one can 
immediately obtain it for general xp by applying the theorem with 
f(x) replaced by f(x — Z0).) 


Let a be the number a := Daj oe (0), and a’ denote the quantity 


a := 3% 24 (0). Our task is to show that a’ = a. 
Let £ > 0. Because the double derivatives of f are continuous, 
we can find a ô > 0 such that 


O Of 
—— = < 
oF, Ox; (x) al = 


17.5. Double derivatives and Clairaut’s theorem 557 


d 
ii ə of 


/ 
— < 
aon a| <e 


whenever |x| < 2ô. 
Now we consider the quantity 


X := f(ôei + de;) — f(de;) — f(de;) — f (0). 


From the fundamental theorem of calculus in the e; variable, we 
have 


ô 
f (de; + de;) — f(de;) = i SL (wee + 6e;) dz; 


ô 
f(õei) — f(0) = f SL (ziei) ig. 


and hence 


N = Da, me + de;) — O ine dz;. 
f Ox; 


But by the mean value theorem, for each x; we have 


Of ôf _, Oo Ofa 
Da, ~— (xe; + de;) — on Tiei) = Te a a 


for some 0 < x; < 6. By our construction of ô, we thus have 


Of Of 
os < TE -e;) — < eô. 
on (ziei + de;) Dan, (ziei) — dal < €6 


Integrating this from 0 to 0, we thus obtain 
|X — 62a] < e°. 


We can run the same argument with the rôle of i and j reversed 
(note that X is symmetric in i and j), to obtain 


|X — 62a’| < €8°. 


558 17. Several variable differential calculus 


From the triangle inequality we thus obtain 
\52a — 62a"| < 266", 


and thus 
la — a’| < 2e. 


But this is true for all € > 0, and a and a’ do not depend on e, 
and so we must have a = a’, as desired. o 


One should caution that Clairaut’s theorem fails if we do 
not assume the double derivatives to be continuous; see Exercise 
17.5.1. 


Exercise 17.5.1. Let f : R? — R be the function defined by f(z, y) := 
AY when (x,y) # (0,0), and f(0,0) := 0. Show that f is continuously 
differentiable, and the double derivatives £ oe and 2y exist, but are 
not equal to each other at (0,0). Explain why this does not contradict 
Clairaut’s theorem. 


17.6 The contraction mapping theorem 


Before we turn to the next topic - namely, the inverse function 
theorem - we need to develop a useful fact from the theory of 
complete metric spaces, namely the contraction mapping theorem. 


Definition 17.6.1 (Contraction). Let (X,d) be a metric space, 
and let f : X — X be a map. We say that f is a contraction if 
we have d( f(x), f(y)) < d(x,y) for all x,y € X. We say that f is 
a strict contraction if there exists a constant 0 < c < 1 such that 
d( f(x), f(y)) < cd(x,y) for all x,y € X; we call c the contraction 
constant of f. 


Examples 17.6.2. The map f : R — R defined by f(x) := 2+1 
is a contraction but not a strict contraction. The map f:R— R 
defined by f(z) := x/2is a strict contraction. The map f : [0,1] — 
[0,1] defined by f(x) := x — x? is a contraction but not a strict 
contraction. (For justifications of these statements, see Exercise 
17.6.5. 


17.6. The contraction mapping theorem 559 


Definition 17.6.3 (Fixed points). Let f : X — X bea map, and 
z E€ X. We say that z is a fized point of f if f(x) = z. 


Contractions do not necessarily have any fixed points; for in- 
stance, the map f : R — R defined by f(x) = z +1 does not. 
However, it turns out that strict contractions always do, at least 
when X is complete: 


Theorem 17.6.4 (Contraction mapping theorem). Let (X,d) be 
a metric space, and let f : X — X be a strict contraction. Then 
f can have at most one fixed point. Moreover, if we also assume 
that X is non-empty and complete, then f has exactly one fixed 
point. 


Proof. See Exercise 17.6.7. go 


Remark 17.6.5. The contraction mapping theorem is one exam- 
ple of a fixed point theorem - a theorem which guarantees, assum- 
ing certain conditions, that a map will have a fixed point. There 
are a number of other fixed point theorems which are also useful. 
One amusing one is the so-called hairy ball theorem, which (among 
other things) states that any continuous map f : 8S? — S? from 
the sphere S? := { (x,y, z) € RÌ : £? +y? +z? = 1} to itself, must 
contain either a fixed point, or an anti-fixed point (a point x € S? 
such that f(z) = —x). A proof of this theorem can be found in 
any topology text; it is beyond the scope of this text. 


We shall give one consequence of the contraction mapping the- 
orem which is important for our application to the inverse function 
theorem. Basically, this says that any map f on a ball which is a 
“small” perturbation of the identity map, remains one-to-one and 
cannot create any internal holes in the ball. 


Lemma 17.6.6. Let B(0,r) be a ball in R” centered at the origin, 
and let g : B(0,r) — R” be a map such that g(0) = 0 and 


lg(a) — g(y)Il < sll — yl 


for all x,y € B(0,r) (here ||x|| denotes the length of x in R”). 
Then the function f : B(0,r) — R” defined by f(x) := x + g(x) 


560 17. Several variable differential calculus 


is one-to-one, and furthermore the image f(B(0,r)) of this map 
contains the ball B(0,r/2). 


Proof. We first show that f is one-to-one. Suppose for sake of 
contradiction that we had two different points x,y € B(0,r) such 
that f(x) = f(y). But then we would have x + g(z) = y+ g(y), 
and hence 


llg(z) — gy) || = |æ — yll. 


The only way this can be consistent with our hypothesis ||g(x) — 
9(y)|| < sla — yll is if |x — yl| = 0, i.e., if £ = y, a contradiction. 
Thus f is one-to-one. 

Now we show that f(B(0,r)) contains B(0,r/2). Let y be any 
point in B(0,r/2); our objective is to find a point x € B(0,r) 
such that f(x) = y, or in other words that x = y — g(x). So the 
problem is now to find a fixed point of the map z +> y — g(x). 

Let F' : B(0,r) — B(0,r) denote the function F(x) := y—g(z). 
Observe that if x € B(0,r), then 


r 
lE) < llyll+lo(@)Il <5 5 tll9(2) -90| < 5 rs lle oll < 5+ 
so F does indeed map B(0,r) to itself. Also, for any xz, x'in B 
we have 


(0,7) 


IF@) - F(a’)|| = lo’) - gle)ll < Zle — l 


so F is a strict contraction. By the contraction mapping theorem, 
F has a fixed point, i.e., there exists an x such that x = y — g(x). 
But this means that f(x) = y, as desired. O 


Exercise 17.6.1. Let f : [a,b] — R be a differentiable function of one 
variable such that |f’(x)| < 1 for all z € [a,b]. Prove that f is a 
contraction. (Hint: use the mean-value theorem, Corollary 10.2.9.) If in 
addition |f’(x)| < 1 for all x € [a,b], show that f is a strict contraction. 


Ezercise 17.6.2. Show that if f : [a,b] — R is differentiable and is a 
contraction, then |f’ (x)| < 1. 


17.7. The inverse function theorem 561 


Exercise 17.6.3. Give an example of a function f : [a,b] — R which is 
continuously differentiable and which is a strict contraction, but such 
that |f’ (x)| = 1 for at least one value of z € [a,b]. 


Exercise 17.6.4. Given an example of a function f : [a,b] — R which is 
a strict contraction but which is not differentiable for at least one point 
T in la, b] i 

Exercise 17.6.5. Verify the claims in Examples 17.6.2. 


Exercise 17.6.6. Show that every contraction on a metric space X is 
necessarily continuous. 


Exercise 17.6.7. Prove Theorem 17.6.4. (Hint: to prove that there is at 
most one fixed point, argue by contradiction. To prove that there is at 
least one fixed point, pick any zo € X and define recursively +; = f (zo), 
ao = f(z1), £3 = f(z2), etc. Prove inductively that d(£n+1, £n) < 
cd(z1, £0), and conclude (using the geometric series formula, Lemma 
7.3.3) that the sequence (z,)°2 is a Cauchy sequence. Then prove that 
the limit of this sequence is a fixed point of f.) 

Erercise 17.6.8. Let (X,d) be a complete metric space, and let f : X — 
X and g : X — X be two strict contractions on X with contraction 
coefficients c and c’ respectively. From Theorem 17.6.4 we know that f 
has some fixed point x9, and g has some fixed point yo. Suppose we know 
that there is an € > 0 such that d(f(x),g(z)) < € for all z € X (i.e., 
f and g are within € of each other in the uniform metric). Show that 
d(zo, yo) < €/(1 — max(c,c’)). Thus nearby contractions have nearby 
fixed points. 


17.7 The inverse function theorem in several vari- 
able calculus 


We recall the inverse function theorem in single variable calculus 
(Theorem 10.4.2), which asserts that if a function f : R —> R 
is invertible, differentiable, and f’(zo) is non-zero, then f7! is 
differentiable at f (xo), and 


1 
-ly 
zo)) = ——~. 
In fact, one can say something even when f is not invertible, 
as long as we know that f is continuously differentiable. If f’(zo) 


562 17. Several variable differential calculys 


is non-zero, then f’(xp) must be either strictly positive or strictly 
negative, which implies (since we are assuming f’ to be continu. 
ous) that f'(x) is either strictly positive for x near zo, or strictly 
negative for x near zo. In particular, f must be either strictly 
increasing near Zo, or strictly decreasing near rp. In either case, 
f will become invertible if we restrict the domain and range of 
f to be sufficiently close to xo and to f(xzo) respectively. (The 
technical terminology for this is that f is locally invertible near 
£o.) 

The requirement that f be continuously differentiable is im- 
portant; see Exercise 17.7.1. 

It turns out that a similar theorem is true for functions f : 
R” — R” from one Euclidean space to the same space. However, 
the condition that f'(xo) is non-zero must be replaced with a 
slightly different one, namely that f'(xo) is invertible. We first 
remark that the inverse of a linear transformation is also linear: 


Lemma 17.7.1. Let T : R” — R” be a linear transformation 
which is also invertible. Then the inverse transformation T! : 
R” — R” is also linear. 


Proof. See Exercise 17.7.2. o 


We can now prove an important and useful theorem, arguably 
one of the most important theorems in several variable differential 
calculus. 


Theorem 17.7.2 (Inverse function theorem). Let E be an open 
subset of R”, and let T : E — R” be a function which is con- 
tinuously differentiable on E. Suppose xo € E is such that the 
linear transformation f'(xo) : R” — R” is invertible. Then there 
exists an open set U in E containing rp, and an open set V in 
R” containing f (xo), such that f is a bijection from U to V. In 
particular, there is an inverse map f`! : V — U. Furthermore, 
this inverse map is differentiable at f (xo), and 


(FY (F (0) = (f"(@0))~*. 


17.7. The inverse function theorem 563 


Proof. We first observe that once we know the inverse map f~? 
is differentiable, the formula (f~!)/(f(29)) = (f'(0))~! is auto- 
matic. This comes from starting with the identity 


I=f0f 


on U, where I: R” — R” is the identity map Ix := z, and then 
differentiating both sides using the chain rule at zo to obtain 


I'(xo) = (f~*)'(f (20))f' (20). 


Since I'(ap) = I, we thus have (f~+)'(f(zo)) = (f’(zo))~! as 
desired. 

We remark that this argument shows that if f (£o) is not in- 
vertible, then there is no way that an inverse fT! can exist and 
be differentiable at f(z). 

Next, we observe that it suffices to prove the theorem under the 
additional assumption f(xoọ) = 0. The general case then follows 
from the special case by replacing f by a new function f (x) := 
f(x) — f (£0) and then applying the special case to f (note that V 
will have to shift by f(x9)). Note that f—!(y) = f—!(y+ f(zo)) - 
why?. Henceforth we will always assume f(z) = 0. 

In a similar manner, one can make the assumption xp = 0. 
The general case then follows from this case by replacing f by 
a new function f(x) := f(x + xo) and applying the special case 
to f (note that E and U will have to shift by zo). Note that 
f(y) =f —l(y) + £o - why? Henceforth we will always assume 
to = 0. Thus we now have that f(0) = 0 and that /’(0) is 
invertible. 

Finally, one can assume that f’(0) = J where J: R” — R” is 
the identity transformation Ix = x). The general case then follows 
from this case by replacing f with a new function f : E — R” 
defined by f(x) := f' (0)! f(x), and applying the special case 
to this case. Note from Lemma 17.7.1 that f’(0)—! is a linear 
transformation. In particular, we note that f (0) = 0 and that 


FO = OfO) =I, 


564 17. Several variable differential calculys 


so by the special case of the inverse function theorem we know 
that there exists an open set U’ containing 0, and an open set y’ 
containing 0, such that f is a bijection from U’ to V’, and that 
f~! : V! — U’ is differentiable at 0 with derivative I. But we have 
f(x) = f'(0)f(x), and hence f is a bijection from U’ to f’ (0)(V’) 
(note that f’(0) is also a bijection). Since f’(0) and its inverse 
are both continuous, f’(0)(V’) is open, and it certainly contains 
0. Now consider the inverse function f~! : f’ Wo ) = U". Since 
f(a) = "(0)F(«), we see that f(y) = #(f"O)-y) for all 
y € f'(0)(V’) (why? use the fact that f is a bijection from U’ to 
V’). In particular we see that f—! is differentiable at 0. 

So all we have to do now is prove the inverse function theorem 
in the special case, when zo = 0, f (£o) = 0, and f’(x9) = I. Let 
g : E — R” denote the function f(x) — x. Then g(0) = 0 and 
g'(0) = 0. In particular 

Og 
Oz; g V= 


for j = 1,...,n. Since g is continuously differentiable, there thus 
exists a ball B(0,r) in E such that 


Il < z 


for all x € B(0,r). (There is nothing particularly special about 
er we just need a nice small number here.) In particular, for 
any x € B(0,r) and v = (v1, ..., Un) we have 


A a 
|Dv9(z)ll = Ios E) 
j=1 3 


< E lle (2) 


j=l 
n 


s} vlz $ 5l: 
oe 


17.7. The inverse function theorem 565 


But now for any z,y E€ B(0,r), we have by the fundamental the- 
orem of calculus 


1 
atv) - 92) = | Fale +t(y-2)) a 


1 
z J Dy-z9(2 + t(y — 2)) dt. 


By the previous remark, the vectors Dy—szg(x + t(y — x)) have a 
magnitude of at most + ||y — x||. Thus every component of these 


vectors has magnitude at most sally —z||. Thus every component 


of g(y) — g(x) has magnitude at most +||y — z||, and hence g(y) — 
g(x) itself has magnitude at most $|ly — || (actually, it will be 
substantially less than this, but this bound will be enough for our 
purposes). In other words, g is a contraction. By Lemma 17.6.6, 
the map f = g + I is thus one-to-one on B(0,r), and the image 
f(B(0,r)) contains B(0,r/2). In particular we have an inverse 
map f7! : B(0,r/2) — B(0,r) defined on B(0,r/2). 

Applying the contraction bound with y = 0 we obtain in par- 
ticular that 


1 
ls()l < Sle 


for all x € B(0,r), and so by the triangle inequality 


sell < IFN < Sli 


for all x € B(0,r). 

Now we set V := B(0,r/2) and U := f—(B(0,r)). Then by 
construction f is a bijection from U to V. V is clearly open, and 
U = f-}(V) is also open since f is continuous. (Notice that if 
a set is open relative to B(0,r), then it is open in R” as well). 
Now we want to show that f7} : V — U is differentiable at 0 with 
derivative J~! = I. In other words, we wish to show that 


im 1A @)— FO) —He-Oll _ 9 


t—0;2EV \{0} æl 


566 17. Several variable differential calculys 


Since f(0) = 0, we have f—!(0) = 0, and the above simplifies to 


Ifta) -all 


i 0. 
z—0;ceV\{0} —|[2"|| 


Let (£n); be any sequence in V\0 that converges to 0. By 
Proposition 14.1.5(b), it suffices to show that 


im Uf (a) = tall _ 
noo (all 


0. 


Write yn := f—'(tn). Then yn E B(0,r) and £n = f(yn). In 
particular we have 


1 3 
slynll < lenll < 5 lal 


and so since ||x,|| goes to 0, ||yn|| goes to zero also, and their ratio 
remains bounded. It will thus suffice to show that 


lim lyn — fn) =Q. 


noo || Yn 
But since y, is going to 0, and f is differentiable at 0, we have 
fon UE (Yn) = £0) = fn = OI) _ 
oe lyn l| 


as desired (since f (0) = 0 and f’(0) = J). O 


0 


The inverse function theorem gives a useful criterion for when 
a function is (locally) invertible at a point xp - all we need is 
for its derivative f’(zp) to be invertible (and then we even get 
further information, for instance we can compute the derivative of 
f7 at f(zo)). Of course, this begs the question of how one can 
tell whether the linear transformation f’(xo) is invertible or not. 
Recall that we have f'(£o) = LpDf(zọ), so by Lemmas 17.1.13 and 
17.1.16 we see that the linear transformation f’(xo) is invertible 
if and only if the matrix Df(xo) is. There are many ways to 
check whether a matrix such as Df (xo) is invertible; for instance, 


17.8. The implicit function theorem 567 


one can use determinants, or alternatively Gaussian elimination 
methods. We will not pursue this matter here, but refer the reader 
to any linear algebra text. 

If f'(£o) exists but is non-invertible, then the inverse function 
theorem does not apply. In such a situation it is not possible 
for f~! to exist and be differentiable at xo; this was remarked 
in the above proof. But it is still possible for f to be invertible. 
For instance, the single-variable function f : R — R defined by 
f(z) = 2° is invertible despite f’(0) not being invertible. 


Exercise 17.7.1. Let f : R — R be the function defined by f(r) := 
g + x? sin(1/x*) for x # 0 and f(0) := 0. Show that f is differentiable 
and f’(0) = 1, but f is not increasing on any open set containing 0 
(Hint: show that the derivative of f can turn negative arbitrarily close 
to 0. Drawing a graph of f may aid your intuition.) 

Exercise 17.7.2. Prove Lemma 17.7.1. 


Exercise 17.7.3. Let f : R” — R” be a continuously differentiable 
function such that f'(x) is an invertible linear transformation for every 
r E€ R”. Show that whenever V is an open set in R”, that f(V) is also 
open. (Hint: use the inverse function theorem.) 


17.8 The implicit function theorem 


Recall (from Exercise 3.5.10) that a function f : R — R. gives rise 
to a graph 


{(x, f(x)) : £ € R} 


which is a subset of R?, usually looking like a curve. However, not 
all curves are graphs, they must obey the vertical line test, that 
for every x there is exactly one y such that (x,y) is in the curve. 
For instance, the circle {(z,y) € R? : z? +y? = 1} is not a graph, 
although if one restricts to a semicircle such as {(z,y) € R? : 
z? + y? = 1,y > 0} then one again obtains a graph. Thus while 
the entire circle is not a graph, certain local portions of it are. 
(The portions of the circle near (1,0) and (0,1) are not graphs 
over the variable x, but they are graphs over the variable y). 


568 17. Several variable differential calculys 


Similarly, any function f : R” — R gives rise to a graph 
{(x, f(x)) : x € R”} in R"*!, which in general looks like some 
sort of n-dimensional surface in R"*! (the technical term for this 
is a hypersurface). Conversely, one may ask which hypersurfaceg 
are actually graphs of some function, and whether that function 
is continuous or differentiable. 

If the hypersurface is given geometrically, then one can again 
invoke the vertical line test to work out whether it is a graph 
or not. But what if the hypersurface is given algebraically, for 
instance the surface {(z,y,z) € R : zy + yz + zz = —1}? Or 
more generally, a hypersurface of the form {x € R” : f(x) = 0}, 
where f : R” — R is some function? In this case, it is still 
possible to say whether the hypersurface is a graph, locally at 
least, by means of the implicit function theorem. 


Theorem 17.8.1 (Implicit function theorem). Let E be an open 
subset of R”, let f : E — R be continuously differentiable, and 
let y = (Y1,.-. Yn) be a point in E such that f(y) = 0 and 
gE (y) # 0. Then there exists an open subset U of R"! con- 
taining (Y1, ...,Yn—-1), an open subset V of E containing y, and a 
function g : U > R such that g(y1,.--,;Yn—-1) = Yn, and 


{(£1,..-; £n) E V : f(£1,..., £n) = 0} 


= {(21,.- -»fn-1,9(21,-.. ,Zn-1)) . (ips ini) E U}. 


In other words, the set {x € V : f(x) = 0} is a graph of a function 


over U. Moreover, g is differentiable at (y1,..-,Yn—1), and we 
have 5 of of 
g 
— ois _1)=-=—— —— wl 


foralll<j<n-1. 


Remark 17.8.2. The equation (17.1) is sometimes derived using 
implicit differentiation. Basically, the point is that if you know 
that 


AEA TEE =O 


17.8. The implicit function theorem 569 


then (as long as 2L # 0) the variable x, is “implicitly” defined 
in terms of the other n — 1 variables, and one can differentiate 
the above identity in, say, the z; direction using the chain rule to 
obtain 


oe Oe. er 


which is (17.1) in disguise (we are using g to represent the implicit 
function defining £n in terms of £1,..., £n). Thus, the implicit 
function theorem allows one to define a dependence implicitly, by 
means of a constraint rather than by a direct formula of the form 
Tn = 9(£1,.--,Zn-1). 


Proof. This theorem looks somewhat fearsome, but actually it is 


a fairly quick consequence of the inverse function theorem. Let 
F: E — R” be the function 


ED svey an) eS (Tiss ni] Giessen) ): 


This function is continuously differentiable. Also note that 


F(y) a (Y1)- e , Yn-1,0) 


and 
ð O O 
DFW) = (SW) SEW) FAW) 
1 0 . 0 0 
0 1 . 0 0 
0 0 ees, | 0 
La) £m .. aha £w 


Since 2L (y) is assumed by hypothesis to be non-zero, this matrix 
is invertible; this can be seen either by computing the determinant, 
or using row reduction, or by computing the inverse explicitly, 


570 17. Several variable differential calculus 


which is 
1 0 . 0 0 
0 1 . 0 0 
DF(y)7! = : ; a 3 : ) 
0 0 wea l 0 
-2E (y)/a Ely)/a ... s2h(y)/a 1/a 


where we have written a = gt (y) for short. Thus the inverse func- 
tion theorem applies, and we can find an open set V in E contain- 
ing y, and an open set W in R” containing F'(y) = (y1,.--, Yn-1,0), 
such that F is a bijection from V to W, and that F—! is differen- 
tiable at (y1,...,;Yn—1, 0). 

Let us write F—! in co-ordinates as 


F(x) = (ha(z), ho(x),...,hn(a)) 


where x € W. Since F(F~'(x)) = z, we have hj(z1,...,0n) = 2; 
for all 1 < j < n— 1 and z € W, and 


Siirsin hnl Pines oy Ba) ) = bn 


Also, hn is differentiable at (41,...,Yn—1,0) since F~! is. 

Now we set U := {(21,...,2n-1) E R”! : (£1,...,£n-1,0) € 
W}. Note that U is open and contains (y,...,;Yn—1,0). Now we 
define g : U > R by g(£1,...,£n—1) := An(21,.-.,£n—-1,0). Then 
g is differentiable at (y1,...,%,-1). Now we prove that 


iieri tn) eV e S (Bivens ta) — 0} 


= {(21,.. . , En-1, 9(£1,- sap Bn-1)) . (Dia . Bai) = U}. 


First suppose that (£1,..., £n) € V and f(z1,...,£n) = 0. Then 
we have F(z£1,..., £n) = (£1,-..,£n—-1,0), which lies in W. Thus 
(x1,...,2%n_1) liesin W. Applying F—!, we see that (71,...,2n) = 
F-1(a,... ,Zn—1,0). In particular £n = hy(21,...,2n—1,0), and 
hence £n = g(£1,...,£n—-1). Thus every element of the left-hand 
set lies in the right-hand set. The reverse inclusion comes by 
reversing all the above steps and is left to the reader. 


17.8. The implicit function theorem 571 


Finally, we show the formula for the partial derivatives of g. 
From the preceding discussion we have 


f (x1, ae »En—1,9(X1, cee ,En—1)) =0 


for all (£1,...,£n-1) E U. Since g is differentiable at (y1,..., Yn—1), 


and f is differentiable at (y1,...,Yn—-1;9(Y1, ---,Yn-1)) = y, we 
may use the chain rule, differentiating in z;, to obtain 


Oa, Y a5 Dan V) Dr; Y sped Unei) 


. 


and the claim follows by simple algebra. O 


Example 17.8.3. Consider the surface S := {(z, y,z) € R : ry+ 
yz + zz = —1}, which we rewrite as {(z,y,z) € R3 : f(x,y,z) = 
0}, where f : R? — R is the function F(a, 9 2) = cytyz+ze+1. 
Clearly f is continuously differentiable, and of = y +x. Thus for 
any (Xo, Yo, Zo) in S with yo + zo # 0, one can write this surface 
(near (Zo, yo, z0)) as a graph of the form {(x, y, g(x, y)) : (z,y) € 
U} for some open set U containing (zo, yo), and some function 
g which is differentiable at (£o, yo). Indeed one can implicitly 
differentiate to obtain that 


_ Yo t+ 20 
Yo + Xo 


_ £0 + 20 
yo + To 


5 (0, yo) = and 52 (eo, 0) = 

In the implicit function theorem, if the derivative 2L equals 
zero at some point, then it is unlikely that the set {x € R”: 
f(z) = 0} can be written as a graph of the £n variable in terms 
of the other n — 1 variables near that point. However, if some 
other derivative 2L is zero, then it would be possible to write the 
z; variable in terms of the other n — 1 variables, by a variant of 
the implicit function theorem. Thus as long as the gradient V f 
is not entirely zero, one can write this set {x € R” : f(z) = 0} as 
a graph of some variable x; in terms of the other n — 1 variables. 
(The circle {(z,y) € R? : z? + y? — 1 = 0} is a good example 
of this; it is not a graph of y in terms of x, or x in terms of y, 
but near every point it is one of the two. And this is because the 


572 17. Several variable differential calculys 


gradient of x? +y? — 1 is never zero on the circle.) However, if Vf 
does vanish at some point Zp, then we say that f has a critical 
point at xo and the behavior there is much more complicated. For 
instance, the set {(z,y) € R? : z? — y? = 0} has a critical point 
at (0,0) and there the set does not look like a graph of any sort 
(it is the union of two lines). 


Remark 17.8.4. Sets which look like graphs of continuous func- 
tions at every point have a name, they are called manifolds. Thus 
{x € R” : f(x) = 0} will be a manifold if it contains no crit- 
ical points of f. The theory of manifolds is very important in 
modern geometry (especially differential geometry and algebraic 
geometry), but we will not discuss it here as it is a graduate level 
topic. 


Chapter 18 


Lebesgue measure 


In the previous chapter we discussed differentiation in several vari- 
able calculus. It is now only natural to consider the question of 
integration in several variable calculus. The general question we 
wish to answer is this: given some subset Q of R”, and some 
real-valued function f : Q — R, is it possible to integrate f on 
Q to obtain some number fẹ f? (It is possible to consider other 
types of functions, such as complex-valued or vector-valued func- 
tions, but this turns out not to be too difficult once one knows 
how to integrate real-valued functions, since one can integrate a 
complex or vector valued function, by integrating each real-valued 
component of that function separately.) 


In one dimension we already have developed (in Chapter 11) 
the notion of a Riemann integral J a,b] f, which answers this ques- 
tion when Qis an interval Q = [a,b], and f is Riemann integrable. 
Exactly what Riemann integrability means is not important here, 
but let us just remark that every piecewise continuous function 
is Riemann integrable, and in particular every piecewise constant 
function is Riemann integrable. However, not all functions are 
Riemann integrable. It is possible to extend this notion of a Rie- 
mann integral to higher dimensions, but it requires quite a bit of 
effort and one can still only integrate “Riemann integrable” func- 
tions, which turn out to be a rather unsatisfactorily small class of 
functions. (For instance, the pointwise limit of Riemann integrable 
functions need not be Riemann integrable, and the same goes for 


574 18. Lebesgue measure 


an L? limit, although we have already seen that uniform limits of 
Riemann integrable functions remain Riemann integrable.) 


Because of this, we must look beyond the Riemann integra] 
to obtain a truly satisfactory notion of integration, one that can 
handle even very discontinuous functions. This leads to the notion 
of the Lebesgue integral, which we shall spend this chapter and the 
next constructing. The Lebesgue integral can handle a very large 
class of functions, including all the Riemann integrable functions 
but also many others as well; in fact, it is safe to say that it can 
integrate virtually any function that one actually needs in math- 
ematics, at least if one works on Euclidean spaces and everything 
is absolutely integrable. (If one assumes the axiom of choice, then 
there are still some pathological functions one can construct which 
cannot be integrated by the Lebesgue integral, but these functions 
will not come up in real-life applications. ) 


Before we turn to the details, we begin with an informal dis- 
cussion. In order to understand how to compute an integral fo f, 
we must first understand a more basic and fundamental question: 
how does one compute the length/area/volume of Q? To see why 
this question is connected to that of integration, observe that if 
one integrates the function 1 on the set Q, then one should obtain 
the length of 2 (if Q is one-dimensional), the area of 2 (if Q is 
two-dimensional), or the volume of Q (if Q is three-dimensional). 
To avoid splitting into cases depending on the dimension, we shall 
refer to the measure of Q as either the length, area, volume, (or 
hypervolume, etc.) of Q, depending on what Euclidean space R” 
we are working in. 


Ideally, to every subset 2. of R” we would like to associate a 
non-negative number m(Q), which will be the measure of Q (i.e., 
the length, area, volume, etc.). We allow the possibility for m(Q) 
to be zero (e.g., if Q is just a single point or the empty set) or for 
m(Q) to be infinite (e.g., if Q is all of R”). This measure should 
obey certain reasonable properties; for instance, the measure of 
the unit cube (0,1)" := {(£1,..., £n): 0 < x; < 1} should equal 
1, we should have m(A U B) = m(A) + m(B) if A and B are 
disjoint, we should have m(A) < m(B) whenever A C B, and we 


18.1. The goal: Lebesgue measure 575 


should have m(x + A) = m(A) for any x € R” (i.e., if we shift A 
by the vector x the measure should be the same). 

Remarkably, it turns out that such a measure does not exist; 
one cannot assign a non-negative number to every subset of R” 
which has the above properties. This is quite a surprising fact, as 
it goes against one’s intuitive concept of volume; we shall prove it 
later in these notes. (An even more dramatic example of this fail- 
ure of intuition is the Banach-Tarski paradoz, in which a unit ball 
in R? is decomposed into five pieces, and then the five pieces are 
reassembled via translations and rotations to form two complete 
and disjoint unit balls, thus violating any concept of conservation 
of volume; however we will not discuss this paradox here.) 

What these paradoxes mean is that it is impossible to find 
a reasonable way to assign a measure to every single subset of 
R”. However, we can salvage matters by only measuring a cer- 
tain class of sets in R” - the measurable sets. These are the only 
sets Q for which we will define the measure m(Q), and once one 
restricts one’s attention to measurable sets, one recovers all the 
above properties again. Furthermore, almost all the sets one en- 
counters in real life are measurable (e.g., all open and closed sets 
will be measurable), and so this turns out to be good enough to 
do analysis. 


18.1 The goal: Lebesgue measure 


Let R” be a Euclidean space. Our goal in this chapter is to define 
a concept of measurable set, which will be a special kind of subset 
of R”, and for every such measurable set Q C R”, we will define 
the Lebesgue measure m(Q) to be a certain number in [0,00]. The 
concept of measurable set will obey the following properties: 


(i) (Borel property) Every open set in R” is measurable, as is 
every closed set. 


(ii) (Complementarity) If Q is measurable, then R”\Q is also 
measurable. 


576 18. Lebesgue measure 


(iii) (Boolean algebra property) If (Q;)j;¢ 7 is any finite collection 
of measurable sets (so J is finite), then the union U,.,, 
and intersection ();- 70; are also measurable. 


(iv) (o-algebra property) If (Q;);e7 are any countable collec- 
tion of measurable sets (so J is countable), then the union 
U;< 7; and intersection f jez 9; are also measurable. 


Note that some of these properties are redundant; for instance, 
(iv) will imply (iii), and once one knows all open sets are measur- 
able, (ii) will imply that all closed sets are measurable also. The 
properties (i-iv) will ensure that virtually every set one cares about 
is measurable; though as indicated in the introduction, there do 
exist non-measurable sets. 

To every measurable set 22, we associate the Lebesgue measure 
m(Q) of 2, which will obey the following properties: 


(v) (Empty set) The empty set Ø has measure m(@) = 0. 


(vi) (Positivity) We have 0 < m(Q) < +oo for every measurable 
set 2. 


(vii) (Monotonicity) If A C B, and A and B are both measurable, 
then m(A) < m(B). 


(viii) (Finite sub-additivity) If (A;)jeJy are a finite collection of 
measurable sets, then m(U,<7 Aj) < jez Aj). 


(ix) (Finite additivity) If (A;)j¢7 are a finite collection of disjoint 
measurable sets, then m(U;eg Aj) = Doje7 MAs). 


(x) (Countable sub-additivity) If (A;)j¢7 are a countable collec- 
tion of measurable sets, then m(U;jeg Aj) < doje m(Ay). 


(xi) (Countable additivity) If (A;);<¢7 are a countable collection 
of disjoint measurable sets, then m(U;< 7 Aj) = do jez ™(Ay)- 


(xii) (Normalization) The unit cube (0, 1]” = {(£1,..., £n) E R”: 
0 < zj <1 for alll < j <n} has measure m((0, 1]") = 1. 


18.2. First attempt: Outer measure 577 


(xiii) (Translation invariance) If Q is a measurable set, and x € 
R”, then t +Q := {z +y : y € Q} is also measurable, and 
m(x +9) = m(Q). 


Again, many of these properties are redundant; for instance 
the countable additivity property can be used to deduce the fi- 
nite additivity property, which in turn can be used to derive 
monotonicity (when combined with the positivity property). One 
can also obtain the sub-additivity properties from the additivity 
ones. Note that m(Q) can be +00, and so in particular some of the 
sums in the above properties may also equal +00. (Since every- 
thing is positive we will never have to deal with indeterminate 
forms such as —co + +00.) 

Our goal for this chapter can then be stated thus: 


Theorem 18.1.1 (Existence of Lebesgue measure). . There ez- 
ists a concept of a measurable set, and a way to assign a number 
m(Q) to every measurable subset Q C R”, which obeys all of the 
properties (i)- (xiii). 


It turns out that Lebesgue measure is pretty much unique; 
any other concept of measurability and measure which obeys ax- 
ioms (i)-(xiii) will largely coincide with the construction we give. 
However there are other measures which obey only some of the 
above axioms; also, we may be interested in concepts of measure 
for other domains than Euclidean spaces R”. This leads to mea- 
sure theory, which is an entire subject in itself and will not be 
pursued here; however we do remark that the concept of measures 
is very important in modern probability, and in the finer points of 
analysis (e.g., in the theory of distributions). 


18.2 First attempt: Outer measure 


Before we construct Lebesgue measure, we first discuss a some- 
what naive approach to finding the measure of a set - namely, 
we try to cover the set by boxes, and then add up the volume of 
each box. This approach will almost work, giving us a concept 


578 18. Lebesgue measure 


called outer measure which can be applied to every set and obeys 
all of the properties (v)-(xiii) except for the additivity properties 
(ix), (xi). Later we will have to modify outer measure slightly to 
recover the additivity property. 

We begin by starting with the notion of an open box. 


Definition 18.2.1 (Open box). An open boz (or box for short) B 
in R” is any set of the form 


n 
B= ] [ (ai, bi) — {(a1,... tH) ER": 2,¢€ (ai, bi) for alll <i<n 


i=1 


where b; > a; are real numbers. We define the volume vol(B) of 
this box to be the number 


vol(B) := | [(bi — ai) = (b1 — a1)(b2 — a2)... (bn — an). 


i=1 


For instance, the unit cube (0,1)” is a box, and has volume 
1. In one dimension n = 1, boxes are the same as open intervals. 
One can easily check that in general dimension that open boxes 
are indeed open. Note that if we have b; = a; for some i, then 
the box becomes empty, and has volume 0, but we still consider 
this to be a box (albeit a rather silly one). Sometimes we will 
use vol,,(B) instead of vol(B) to emphasize that we are dealing 
with n-dimensional volume, thus for instance vol;(B) would be 
the length of a one-dimensional box B, volo(B) would be the area 
of a two-dimensional box B, etc. 


Remark 18.2.2. We of course expect the measure m(B) of a 
box to be the same as the volume vol(B) of that box. This is in 
fact an inevitable consequence of the axioms (i)-(xiii) (see Exercise 
18.2.5). 


Definition 18.2.3 (Covering by boxes). Let 2 C R” be a subset 
of R”. We say that a collection (B;)j;e7 of boxes cover Q iff 


QC Uses Bj. 


18.2. First attempt: Outer measure 579 


Suppose 2 C R” can be covered by a finite or countable col- 
lection of boxes (B;)j;e7. If we wish Q to be measurable, and if 
we wish to have a measure obeying the monotonicity and sub- 
additivity properties (vii), (viii), (x) and if we wish m(B,;) = 
vol(B;) for every box j, then we must have 


m(2) < m(|_] By) < X | m(B;) = X_ vol(B;). 


jeJ jeJ jeJ 
We thus conclude 


m(Q) < inf{ ` vol(B;) : (B;);ez covers Q; J at most countable}. 
j=eJ 


Inspired by this, we define 


Definition 18.2.4 (Outer measure). If Q is a set, we define the 
outer measure m*(Q) of 2 to be the quantity 


OO 
m* (Q) := inf{¢ vol(B;) : (Bj)jez7 covers Q; J at most countable}. 
j=1 


Since } -4—1 vol(B;) is non-negative, we know that m*(Q) > 0 
for all Q. However, it is quite possible that m*(Q) could equal 
+oo. Note that because we are allowing ourselves to use a count- 
able number of boxes, that every subset of R” has at least one 
countable cover by boxes; in fact R” itself can be covered by 
countably many translates of the unit cube (0,1)" (how?). We 
will sometimes write m*(Q) instead of m*(Q) to emphasize the 
fact that we are using n-dimensional outer measure. 

Note that outer measure can be defined for every single set 
(not just the measurable ones), because we can take the infimum 
of any non-empty set. It obeys several of the desired properties of 
a measure: 


Lemma 18.2.5 (Properties of outer measure). Outer measure has 
the following six properties: 


(v) (Empty set) The empty set 0 has outer measure m*(@) = 0. 


580 18. Lebesgue measure 


(vi) (Positivity) We have 0 < m*(Q) < +00 for every measurable 
set Q. 


(vii) (Monotonicity) If AC B C R”, then m*(A) < m*(B). 


(viii) (Finite sub-additivity) If (Aj)jez are a finite collection of 
subsets of R”, then m*(Uie7 Aj) < Vje m*(A;). 


(x) (Countable sub-additivity) If (A;)j;ez are a countable collec- 


tion of subsets of R”, then m* (jeg Aj) < X jeg m* (Aj). 


(xiii) (Translation invariance) If Q is a subset of R”, and z E€ R”, 
then m*(x +2) = m*(Q). 
Proof. See Exercise 18.2.1. o 


The outer measure of a closed box is also what we expect: 


Proposition 18.2.6 (Outer measure of closed box). For any 
closed box 


n 
B= ] [lai bi := {(£1,..-, 2n) E R” : z; € [a;, bi] for all 1 <i <n} 
i=1 


we have 
n 


m*(B) = | [ (0: - a). 


i=1 


Proof. Clearly, we can cover the closed box B = []j__,[a:, bi] by 
the open box [];"_, (ai — €, bi + €) for every £ > 0. Thus we have 


n 


n 
m*(B) < vol(] | (a: — €,b; + )) = [Ic (bi — a; + 2€) 
i=1 i=l 


for every £ > 0. Taking limits as € — 0, we obtain 


i=1 


18.2. First attempt: Outer measure 581 


To finish the proof, we need to show that 


B) > | [(b: — ai). 
i=1 


By the definition of m*(B), it suffices to show that 


S vol(B; > TIC (bi — ai) 


jEJ i=1 


whenever (B;);¢7 is a finite or countable cover of B. 

Since B is closed and bounded, it is compact (by the Heine- 
Borel theorem, Theorem 12.5.7), and in particular every open 
cover has a finite subcover (Theorem 12.5.8). Thus to prove the 
above inequality for countable covers, it suffices to do it for fi- 
nite covers (since if (B;)jey is a finite subcover of (B;)jeg then 
Žo jeg VOl(B;) will be greater than or equal to <7 vol(B;)). 

To summarize, our goal is now to prove that 


n 


X vol(BY) > IK (bi — ai) (18.1) 


jeJ i=1 


whenever (BO));ey is a finite cover of [Tj=:[:, bi]; we have changed 
the subscript B; to superscript BY) because we will need the sub- 
scripts to denote components. 

To prove the inequality (18.1), we shall use induction on the 
dimension n. First we consider the base case n = 1. Here B is 
just a closed interval B = [a,b], and each box BY) is just an open 
interval BY) = (a;,b;). We have to show that 


X (b; — aj) > (b- a). 
jEJ 


To do this we use the Riemann integral. For each j € J, let fO) : 
R — R be the function such that fO) (x) = 1 when z € (aj, bj ) 
and f(x) = 0 otherwise. Then we have that f) is Riemann 


582 18. Lebesgue measure 


integrable (because it is piecewise constant, and compactly sup- 


ported) and 
J fP = bj — aj. 


Summing this over all j € J, and interchanging the integral with 
the finite sum, we have 


| EP = bj- ay. 


oo JEJ jEJ 


But since the intervals (aj, bj) cover [a, b], we have $` ;eg f 0) (x) > 
1 for all x € [a,b] (why?). For all other values if z, we have 
ied f(x) > 0. Thus 


[£e] isie 


-00 jeJ [a,b] 


and the claim follows by combining this inequality with the pre- 
vious equality. This proves (18.1) when n = 1. 

Now assume inductively that n > 1, and we have already 
proven the inequality (18.1) for dimensions n — 1. We shall use a 
similar argument to the preceding one. Each box BY) is now of 


the form m 
BO =] (a??, 0”). 
i=l 


We can write this as 
BY = AD x (a), bO) 


where A) is the n — 1-dimensional box AY) := Tt (al? i bË ) ). 
Note that | . . . 
vol(B) = volp—1(A) (b9) — a[9)) 


where we have subscripted vol,_; by n — 1 to emphasize that this 
is n — 1-dimensional volume being referred to here. We similarly 
write 


B = A x |an, by] 


18.2. First attempt: Outer measure 583 


where A := Lay [a;, bi], and again note that 
vol(B) = voln_1(A)(bn — an). 
For each j € J, let f% be the function such that fO) (a) = 


vol,—1(A%)) for all £n € (a) , O) and fO)(z£n) = 0 for all other 
tn. Then fO) is Riemann integrable and 


J f = voln 1 (40) (0 — al) = vol(BO) 


and hence 


J vol(BO) = J i ia 


jEJ ~~ jEJ 
Now let £n E [an, bn] and (21,...,%n-1) E A. Then (z1,..., £n) 


lies in B, and hence lies in one of the B®), Clearly we have 
Tn E (ao), and (%1,...,%n—-1) € AG). In particular, we see 


that for each £n E [an, bn], the set 
{AD : 5 € J;an € (a9), b9))} 


of n — 1-dimensional boxes covers A. Applying the inductive hy- 
pothesis (18.1) at dimension n — 1 we thus see that 


` Volpn—1(A%) > voln—1(A), 
jE J:2n€ (a? pl?) 


or in other words 


S fP (an) 2 voln—1(A). 


jEJ 
Integrating this over [an, bn], we obtain 
J X f > voln-1(4)(bn — an) = vol(B) 
[an,bn] jEJ 
and in particular 


L 2, J” > voln—1(A)(bn — an) = vol(B) 


oo GES 


084 18. Lebesgue measure 


since ð, jeJ f (3) is always non-negative. Combining this with our 
previous identity for {> A Die Jf G) we obtain (18.1), and the in- 
duction is complete. o 


Once we obtain the measure of a closed box, the corresponding 
result for an open box is easy: 


Corollary 18.2.7. For any open box 


n 
B = | [lai 84) = {(21, saa Tn) E R”: x; € [a;, bi] for alll <i [< n}, 


i=1 


we have 
n 


m*(B) = | [(b; — a). 


i=1 
In particular, outer measure obeys the normalization (xii). 


Proof. We may assume that b; > a; for all 7, since if b; = a; this 
follows from Lemma, 18.2.5(v). Now observe that 


Tlie: Feb ie | C [lta bi) C [tas bi] 
i=1 i=1 i=l 


for all £ > 0, assuming that £ is small enough that b; — € >a; +€ 
for all 2. Applying Proposition 18.2.6 and Lemma, 18.2.5(vii) we 
obtain 


n n n 


[[@: — a; — 2e) < m* (J [ (ai, b:)) sIf bi — a;i). 


Sending € — 0 and using the squeeze test (Corollary 6.4.14), one 
obtains the result. CJ 


We now compute some examples of outer measure on the real 
line R. 


18.2. First attempt: Outer measure 585 


Example 18.2.8. Let us compute the one-dimensional measure 
of R. Since (—R, R) C R for all R > 0, we have 


m*(R) > m*((—R, R)) =2R 


by Corollary 18.2.7. Letting R — +co we thus see that m*(R) = 
+00. . 


Example 18.2.9. Now let us compute the one-dimensional mea- 
sure of Q. From Proposition 18.2.6 we see that for each rational 
number Q, the point {q} has outer measure m*({q}) = 0. Since 
Q is clearly the union Q = [J Q {q} of all these rational points 
q, and Q is countable, we have 


m* (Q) < ` m*(Q) = `S 0=0, 
qeQ qeQ 


and so m*(Q) must equal zero. In fact, the same argument shows 
that every countable set has measure zero. (This, incidentally, 
gives another proof that the real numbers are uncountable, Corol- 
lary 8.3.4.) 


Remark 18.2.10. One consequence of the fact that m*(Q) = 0 
is that given any € > 0, it is possible to cover the rationals Q by 
a countable number of intervals whose total length is less than €. 
This fact is somewhat un-intuitive; can you find a more explicit 
way to construct such a countable covering of Q by short intervals? 


Example 18.2.11. Now let us compute the one-dimensional mea- 
sure of the irrationals R\Q. From finite sub-additivity we have 


m*(R) < m*(R\Q) + m*(Q). 


Since Q has outer measure 0, and m*(R) has outer measure +00, 
we thus see that the irrationals R\Q have outer measure +00. A 
similar argument shows that [0, 1]\Q, the irrationals in [0,1], have 
outer measure 1 (why?). 


586 18. Lebesgue measure 


Example 18.2.12. By Proposition 18.2.6, the unit interval (0, 1) 
in R has one-dimensional outer measure 1, but the unit interval 
{(x,0):0 < z < 1} in R? has two-dimensional outer measure 0. 
Thus one-dimensional outer méasure and two-dimensional outer 
measure are quite different. Note that the above remarks and 
countable additivity imply that the entire x-axis of R? has two- 
dimensional outer measure 0, despite the fact that R has infinite 
one-dimensional measure. 


Exercise 18.2.1. Prove Lemma 18.2.5. (Hint: you will have to use the 
definition of inf, and probably introduce a parameter €. You may have 
to treat separately the cases when certain outer measures are equal to 
+oo. (viii) can be deduced from (x) and (v). For (x), label the index 
set J as J = {j1, j2, j3,- - -}, and for each A;, pick a covering of A; by 
boxes whose total volume is no larger than m*(A;) + €/27.) 


Exercise 18.2.2. Let A be a subset of R”, and let B be a subset of R™. 
Note that the Cartesian product {(a,b) : a E€ A,b € B} is then a subset 
of R"*™. Show that m*,,,(A x B) < m*(A)m*,(B). (It is in fact true 


that m*,,,(A x B) = m*(A)m*,(B), but this is substantially harder to 
prove). 


In Exercises 18.2.3-18.2.5, we assume that R” is a Euclidean space, 
and we have a notion of measurable set in R” (which may or may not 
coincide with the notion of Lebesgue measurable set) and a notion of 
measure (which may or may not co-incide with Lebesgue measure) which 
obeys axioms (i)-(xiii). 


Exercise 18.2.3. 


(a) Show that if A; C Ap C A3... is an increasing sequence of mea- 
surable sets (so A; C Aj+41 for every positive integer j), then we 
have m(Uj2, Aj) = limjoo (Aj). 


(b) Show that if A; D Ag D A3... is a decreasing sequence of mea- 
surable sets (so A; D Aj;+41 for every positive integer j), and 
m(A) < +00, then we have m(();2., Aj) = limj—oo m(A)j). 


Exercise 18.2.4. Show that for any positive integer q > 1, that the open 
box 


(0, 1/q)” := {(£1,..., £n) E R” :0< z; < 1/q for all 1 <j <n} 


18.3. Outer measure is not additive 587 


and the closed box 
[0,1/q]” := {(21,...,2n) E R” : 0 < z; < 1/q for alll <j <n} 


both measure q7”. (Hint: first show that m((0,1/q)") < q7” for 
every q > 1 by covering (0,1)” by some translates of (0,1/q)". Us- 
ing a similar argument, show that m([0, 1/q]”) > q7”. Then show that 
m([0, 1/q]"\(0, 1/q)") < £ for every £ > 0, by covering the boundary of 
(0, 1/q]” with some very small boxes.) 

Exercise 18.2.5. Show that for any box B, that m(B) = vol(B). (Hint: 
first prove this when the co-ordinates aj, b; are rational, using Exercise 
18.2.4. Then take limits somehow (perhaps using Q1) to obtain the 
general case when the co-ordinates are real.) 


Exercise 18.2.6. Use Lemma 18.2.5 and Proposition 18.2.6 to furnish 
another proof that the reals are uncountable (i.e., reprove Corollary 
8.3.4). 


18.3 Outer measure is not additive 


In light of Lemma 18.2.5, it would seem now that all we need 
to do is to verify the additivity properties (ix), (xi), and we have 
everything we need to have a usable measure. Unfortunately, these 
properties fail for outer measure, even in one dimension. 


Proposition 18.3.1 (Failure of countable additivity). ‘There ex- 
ists a countable collection (A;)jcz of disjoint subsets of R, such 


that m*(Uic7 Aj) F Diez m* (Ay). 


Proof. We shall need some notation. Let Q be the rationals, and 
R be the reals. We say that a set A C R is a coset of Q if it 
is of the form A = x + Q for some real number gx. For instance, 
V2 +Q is a coset of R, as is Q itself, since Q = 0 + Q. Note 
that a coset A can correspond to several values of x; for instance 
2+ Q is exactly the same coset as 0 + Q. Also observe that it is 
not possible for two cosets to partially overlap; if x + Q intersects 
y + Q in even just a single point z, then z — y must be rational 
(why? use the identity x — y = (x — z) — (y — z)), and thus z + Q 
and y + Q must be equal (why?). So any two cosets are either 
identical or distinct. 


588 18. Lebesgue measure 


We observe that every coset A of the rationals R has a non. 
empty intersection with [0,1]. Indeed, if A is a coset, then A = 
xz + Q for some real number z. If we then pick a rational number 
q in [—x, 1 — x] then we see that x +q € [0, 1], and thus AN [0,1] 
contains x + q. 

Let R/Q denote the set of all cosets of Q; note that this js 
a set whose elements are themselves sets (of real numbers). For 
each coset A in R/Q, let us pick an element x4 of AN (0,1). 
(This requires us to make an infinite number of choices, and thus 
requires the axiom of choice, see Section 8.4.) Let E be the set of 
all such za, i.e., E := {x4 : A E€ R/Q}. Note that E C [0,1] by 
constrution. 

Now consider the set 


X= U (q+ E). 
geQri-1,1] 


Clearly this set is contained in [—1, 2] (since q+ £x € [—1, 2] when- 
ever q E [—1,1] and x € E C ([0,1]). We claim that this set 
contains the interval [0,1]. Indeed, for any y € [0,1], we know 
that y must belong to some coset A (for instance, it belongs to 
the coset y + Q). But we also have x4 belonging to the same 
coset, and thus y — x4 is equal to some rational q. Since y and 
xa both live in [0,1], then q lives in [—1,1]. Since y = q + z4, we 
have y € q + E, and hence y € X as desired. 
We claim that 


m(X)# Š m*(lq+E), 
qeQr-1,1] 
which would prove the claim. To see why this is true, observe 
that since [0,1] C X C [-1,2], that we have 1 < m*(X) < 3 
by monotonicity and Proposition 18.2.6. For the right hand side, 
observe from translation invariance that 


X m(qt+E)= Š m*(£). 


qeQr-1,1] qeQri-1,1] 


18.3. Outer measure is not additive 589 


The set Q N [—1, 1] is countably infinite (why?). Thus the right- 
hand side is either 0 (if m* (E) = 0) or +00 (if m* (E) > 0). Either 
way, it cannot be between 1 and 3, and the claim follows. go 


Remark 18.3.2. The above proof used the axiom of choice. This 
turns out to be absolutely necessary; one can prove using some 
advanced techniques in mathematical logic that if one does not 
assume the axiom of choice, then it is possible to have a mathe- 
matical model where outer measure is countably additive. 


One can refine the above argument, and show in fact that m* 
is not finitely additive either: 


Proposition 18.3.3 (Failure of finite additivity). There exists a 
finite collection (A;)je7 of disjoint subsets of R, such that 


m*((_) As) # >) m*(Aj). 
JEJ JEJ 
Proof. This is accomplished by an indirect argument. Suppose 
for sake of contradiction that m* was finitely additive. Let E and 
X be the sets introduced in Proposition 18.3.1. From countable 
sub-additivity and translation invariance we have 


m*(X}) < > m* (q+ E) = ` m*(E). 


gEQN{[-1,1] qEQn[-1,1] 


Since we know that 1 < m*(X) < 3, we thus have m*(E) Æ 0, 
since otherwise we would have m*(X) < 0, a contradiction. 

Since m*( E) # 0, there exists a finite integer n > 0 such that 
m*(E) > 1/n. Now let J be a finite subset of Q N [—1,1] of 
cardinality 3n, If m* were finitely additive, then we would have 


m*(S > qt E) = S > m*(q+ E) = X o m*(E) > 3n = 3) 


qEJ qeJ qEJ 


But we know that )/,¢7q + E is a subset of X, which has outer 
measure at most 3. This contradicts monotonicity. Hence m* 
cannot be finitely additive. O 


590 18. Lebesgue measure 


Remark 18.3.4. The examples here are related to the Banach- 
Tarski paradoz, which demonstrates (using the axiom of choice) 
that one can partition the unit ball in R into a finite number of 
pieces which, when rotated and translated, can be reassembled to 
form two complete unit balls! Of course, this partition involves 
non-measurable sets. We will not present this paradox here as it 
requires some group theory which is beyond the scope of this text. 


18.4 Measurable sets 


In the previous section we saw that certain sets were badly be- 
haved with respect to outer measure, in particular they could be 
used to contradict finite or countable additivity. However, those 
sets were rather pathological, being constructed using the axiom 
of choice and looking rather artificial. One would hope to be able 
to exclude them and then somehow recover finite and countable 
additivity. Fortunately, this can be done, thanks to a clever defi- 
nition of Constantin Carathéodory (1873-1950): 


Definition 18.4.1 (Lebesgue measurability). Let E be a subset 
of R”. We say that E is Lebesgue measurable, or measurable for 
short, iff we have the identity 


m*(A) = m* (AN E) + m*(A\E) 


for every subset A of R”. If E is measurable, we define the 
Lebesgue measure of E to be m(E) = m*(£); if E is not mea- 
surable, we leave m( E) undefined. 


In other words, E being measurable means that if we use the 
set E to divide up an arbitrary set A into two parts, we keep the 
additivity property. Of course, if m* were finitely additive then 
every set E would be measurable; but we know from Proposition 
18.3.3 that not every set is finitely additive. One can think of the 
measurable sets as the sets for which finite additivity works. We 
sometimes subscript m(E) as m,,(E) to emphasize the fact that 
we are using n-dimensional Lebesgue measure. 


18.4. Measurable sets 591 


The above definition is somewhat hard to work with, and in 
practice one does not verify a set is measurable directly from this 
definition. Instead, we will use this definition to prove various 
useful properties of measurable sets (Lemmas 18.4.2-18.4.11), and 
after that we will rely more or less exclusively on the properties in 
those lemmas, and no longer need to refer to the above definition. 

We begin by showing that a large number of sets are indeed 
measurable. The empty set E = Ø and the whole space E = 
R” are clearly measurable (why?). Here is another example of a 
measurable set: 


Lemma 18.4.2 (Half-spaces are measurable). The half-space 
{(x1,...,2n) € R": zp > 0} 

is measurable. 

Proof. See Exercise 18.4.3. D 


Remark 18.4.3. A similar argument will also show that any half- 
space of the form {(z£1,..., £n) E R” : zj > 0} or {(z1,..., £n) E 
R” : xz; < 0} for some 1 < j < n is measurable. 


Now for some more properties of measurable sets. 


Lemma 18.4.4 (Properties of measurable sets). 


(a) If E is measurable, then R"\E is also measurable. 


(b) (Translation invariance) If E is measurable, and x € R”, 
then x + E is also measurable, and m(z + E) = m( E). 


(c) If Ei and Ez are measurable, then E N Eo and Ei U Ena are 
measurable. 


(d) (Boolean algebra property) If E, E2,..., Ey are measurable, 
then U; Ej and eee E; are measurable. 


(e) Every open box, and every closed box, is measurable. 


592 18. Lebesgue measure 


(f) Any set E of outer measure zero (i.e., m*(E) = 0) is mea- 
surable. 


Proof. See Exercise 18.4.4. go 


From Lemma 18.4.4, we have proven properties (ii), (iii), (xiii) 
on our wish list of measurable sets, and we are making progress 
towards (i). We also have finite additivity (property (ix) on our 
wish list): 


Lemma 18.4.5 (Finite additivity). If (E;)je7 are a finite col- 
lection of disjoint measurable sets and any set A (not necessarily 
measurable), we have 


m*(AN (J Ej) =X m*(An B;). 


jEJ jEJ 
Furthermore, we have m(U;eg Ei) = Xjes m(E;). 
Proof. See Exercise 18.4.6. B 


Remark 18.4.6. Lemma 18.4.5 and Proposition 18.3.3, when 
combined, imply that there exist non-measurable sets: see Ex- 
ercise 18.4.5. 


Corollary 18.4.7. If A C B are two measurable sets, then B\A 
is also measurable, and 


m(B\A) = m(B) — m( A). 
Proof. See Exercise 18.4.7. oO 


Now we show countable additivity. 


Lemma 18.4.8 (Countable additivity). If (E;)jeg are a countable 
collection of disjoint measurable sets, then Use z Ej is measurable, 


and m(U;e7 Ej) = Xjes m(E;). 


18.4. Measurable sets 593 


Proof. Let E := (jez Ej. Our first task will be to show that E is 
measurable. Thus, let A be any other measurable set; we need to 
show that 
m*(A) = m* (AN E) + m*(A\E). 
Since J is countable, we may write J = {j1, j2, j3, ...}. Note 
that = 
ANE=|J(ANE,) 
k=1 


(why?) and hence by countable sub-additivity 


m*(AN E) < X o m*(A N Ej). 
k=1 


We rewrite this as 


N 


m*(AN E) < sup ) m*(ANE;,). 
1 


N21 k= 


Let Fy be the set Fy := Un, Ej,- Since the AN Ej, are all 
disjoint, and their union is AN Fy, we see from Lemma 18.4.5 
that 


N 
` m*(AN Ej) = m*(AN Fy) 
k=1 


and hence 
m*(AN E) < sup m*(AN Fy). 
N21 


Now we look at A\E. Since Fy C E (why?), we have A\E C 
A\Fy (why?). By monotonicity, we thus have 


m*(A\E) < m*(A\Fy) 
for all N. In particular, we see that 


m*(AN E) + m*(A\E) < sip m*(AN Fy) + m*(A\E) 


594 18. Lebesgue measure 


< sup m*(AN Fy) + m*(A\Fy). 
N>1 


But from Lemma 18.4.5 we know that Fy is measurable, and 
hence 
m*(AN Fy) + m*(A\Fw) = m*(A). 


Putting this all together we obtain 

m*(AN E)+m*(A\E) < m*(A). 
But from finite sub-additivity we have 

m* (AN E) +m*(A\E) > m*(A) 


and the claim follows. This shows that FE is measurable. 
To finish the lemma, we need to show that m(E) is equal to 
2 jeg ™(E;). We first observe from countable sub-additivity that 


m(E) < $ m(E;) = $ mEn) 
jEJ k=1 


On the other hand, by finite additivity and monotonicity we have 


N 
m(E) > m(Fy) = > m(Ej,). 
k=1 
Taking limits as N — oo we obtain 


m(E) > >) m(E;,) 
k=1 
and thus we have 
m(E) = X m(E;,) = >| m(E;) 
k=1 jeJ 


as desired. O 


18.4. Measurable sets 595 


This proves property (xi) on our wish list. Next, we do count- 
able unions and intersections. 


Lemma 18.4.9 (o-algebra property). If (0;)j;e7 are any count- 
able collection of measurable sets (so J is countable), then the 
union U;_,2; and the intersection (\;ez 9j are also measurable. 


Proof. See Exercise 18.4.8. g 


The final property left to verify on our wish list is (a). We first 
need a preliminary lemma. 


Lemma 18.4.10. Every open set can be written as a countable 
or finite union of open bozes. 


Proof. We first need some notation. Call a box B = [];_, (ai, bi) 
rational if all of its components a;,b; are rational numbers. Ob- 
serve that there are only a countable number of rational boxes 
(this is since a rational box is described by 2n rational numbers, 
and so has the same cardinality as Q?”. But Q is countable, and 
the Cartesian product of any finite number of countable sets is 
countable; see Corollaries 8.1.14, 8.1.15). 

We make the following claim: given any open ball B(x,r), 
there exists a rational box B which is contained in B(z,r) and 
which contains x. To prove this claim, write x = (£1,..., £n). For 
each 1 <2 < n, let a; and b; be rational numbers such that 


r r 
Ti — — < Qi < Ti < bi < ti + -. 
n n 


Then it is clear that the box Į [;—; (ai, 5;) is rational and contains zx. 
A simple computation using Pythagoras’ theorem (or the triangle 
inequality) also shows that this box is contained in B(x,r), we 
leave this to the reader. 

Now let E be an open set, and let X be the set of all rational 
boxes B which are subsets of E, and consider the union [J geg B of 
all those boxes. Clearly, this union is contained in E, since every 
box in X is contained in E by construction. On the other hand, 
since E is open, we see that for every x € E there is a ball B(x,r) 


596 18. Lebesgue measure 


contained in E, and by the previous claim this ball contains ą 
rational box which contains x. In particular, x is contained ip 
Uses B. Thus we have 


E=|(JB 


Bex 


as desired; note that X is countable or finite because it is a subset 
of the set of all rational boxes, which is countable. D 


Lemma 18.4.11 (Borel property). Every open set, and every 
closed set, is Lebesgue measurable. 


Proof. It suffices to do this for open sets, since the claim for closed 
sets then follows by Lemma 18.4.4(a) (i.e., property (ii)). Let E 
be an open set. By Lemma 18.4.10, E is the countable union of 
boxes. Since we already know that boxes are measurable, and that 
the countable union of measurable sets is measurable, the claim 
follows. o 


The construction of Lebesgue measure and its basic properties 
are now complete. Now we make the next step in constructing 
the Lebesgue integral - describing the class of functions we can 
integrate. 


Exercise 18.4.1. If A is an open interval in R, show that m*(A) = 
m*(A N (0, 00)) + m*(A\(0, œ0)). 


Exercise 18.4.2. If A is an open box in R”, and Æ is the half-plane 
E := {(£1,..., £n) E R” : £n > 0}, show that m*(A) = m*(AN E) + 
m*(A\E). (Hint: use Exercise 18.4.1.) 


Ezercise 18.4.3. Prove Lemma 18.4.2. (Hint: use Exercise 18.4.2.) 
Exercise 18.4.4. Prove Lemma 18.4.4. (Hints: for (c), first prove that 


m*(A) = m*(ANE,NE2)+m* (ANE;\E2)+m*(ANE2\E1)+m*(A\(E,UF)} 
A Venn diagram may be helpful. Also you may need the finite sub- 


additivity property. Use (c) to prove (d), and use (bd) and the various 
versions of Lemma 18.4.2 to prove (e)). 


18.5. Measurable functions 597 


Exercise 18.4.5. Show that the set E used in the proof of Propositions 
18.3.1 and 18.3.3 is non-measurable. 


Exercise 18.4.6. Prove Lemma 18.4.5. 
Exercise 18.4.7. Use Lemma 18.4.5 to prove Corollary 18.4.7. 


Exercise 18.4.8. Prove Lemma 18.4.9. (Hint: for the countable union 
problem, write J = {91,j2,---}, write Fy := a Q;,, and write Ey := 
Fw\Fn-1, with the understanding that Fo is the empty set. Then apply 
Lemma 18.4.8. For the countable intersection problem, use what you 
just did and Lemma 18.4.4(a).) 


Exercise 18.4.9. Let A C R? be the set A := (0, 1]?\Q?; i.e A consists 
of all the points (x,y) in [0,1]? such that z and y are not both rational. 
Show that A is measurable and m(A) = 1, but that A has no interior 
points. (Hint: it’s easier to use the properties of outer measure and 
measure, including those in the exercises above, than to try to do this 
problem from first principles.) 


Exercise 18.4.10. Let A C B C R”. Show that if B is Lebesgue mea- 
surable with measure zero, then A is also Lebesgue measurable with 
measure zero. 


18.5 Measurable functions 


In the theory of the Riemann integral, we are only able to integrate 
a certain class of functions - the Riemann integrable functions. We 
will now be able to integrate a much larger range of functions - 
the measurable functions. More precisely, we can only integrate 
those measurable functions which are absolutely integrable - but 
more on that later. 


Definition 18.5.1 (Measurable functions). Let Q be a measur- 
able subset of R”, and let f : Q — R” bea function. A function f 
is measurable iff f—!(V) is measurable for every open set V C R™. 


As discussed earlier, most sets that we deal with in real life are 
measurable, so it is only natural to learn that most functions we 
deal with in real life are also measurable. For instance, continuous 
functions are automatically measurable: 


598 18. Lebesgue Measure 


Lemma 18.5.2 (Continuous functions are measurable). Let Q be 
a measurable subset of R”, and let f : Q — R” be continuous 
Then f is also measurable. 


Proof. Let V be any open subset of R™. Then since f is contin- 
uous, f—!(V) is open relative to 2 (see Theorem 13.1.5(c)), i.e., 
f-1(V) = W NQ for some open set W C R” (see Proposition 
12.3.4(a)). Since W is open, it is measurable; since Q is measur- 
able, W N Q is also measurable. o 


Because of Lemma 18.4.10, we have an easy criterion to test 
whether a function is measurable or not: 


Lemma 18.5.3. Let Q be a measurable subset of R”, and let 
f :Q — R” be a function. Then f is measurable if and only if 
f-(B) is measurable for every open boz B. 


Proof. See Exercise 18.5.1. g 


Corollary 18.5.4. Let Q be a measurable subset of R”, and let 
f:Q — R” be a function. Suppose that f = (fı, ..., fm), where 
fi: Q — R is the jt co-ordinate of f. Then f is measurable if 
and only if all of the f; are individually measurable. 


Proof. See Exercise 18.5.2. o 


Unfortunately, it is not true that the composition of two mea- 
surable functions is automatically measurable; however we can do 
the next best thing: a continuous function applied to a measurable 
function is measurable. 


Lemma 18.5.5. Let Q be a measurable subset of R”, and let 
W be an open subset of R™. If f :Q— W is measurable, and 
g: W — R? is continuous, then go f : Q — R? is measurable. 


Proof. See Exercise 18.5.3. O 


This has an immediate corollary: 


18.5. Measurable functions 599 


Corollary 18.5.6. Let Q be a measurable subset of R”. If f: 
Q — R is a measurable function, then so is |f|, max(f,0), and 
min(f,0). 


Proof. Apply Lemma 18.5.5 with g(x) := |x|, g(x) := max(z, 0), 
and g(x) := min(z,0). oO 


A slightly less immediate corollary: 


Corollary 18.5.7. Let Q be a measurable subset of R”. Iff: 
Q — R and g : Q — R are measurable functions, then so is f +g, 


f—g9, fg, max(f,g), and min(f,g). If g(x) #0 for alla € Q, 
then f/g is also measurable. 


Proof. Consider f + g. We can write this as koh, where h : Q > 
R? is the function h(x) = (f(x), g(x£)), and k : R? — R is the 
function k(a, b) := a +b. Since f,g are measurable, then h is also 
measurable by Corollary 18.5.4. Since k is continuous, we thus 
see from Lemma 18.5.5 that k o h is measurable, as desired. A 
similar argument deals with all the other cases; the only thing 
concerning the f/g case is that the space R? must be replaced 
with {(a,b) € R? : b Æ 0} in order to keep the map (a,b) > a/b 
continuous and well-defined. oO 


Another characterization of measurable functions is given by 


Lemma 18.5.8. Let Q be a measurable subset of R”, and let 
f: Q — R be a function. Then f is measurable if and only if 
f—*((a, 00)) is measurable for every real number a. 


Proof. See Exercise 18.5.4. o 


Inspired by this lemma, we extend the notion of a measurable 
function to the extended real number system R* := R U {+00} U 


{—oo}: 
Definition 18.5.9 (Measurable functions in the extended reals). 
Let Q be a measurable subset of R”. A function f : Q — R* is 


said to be measurable iff f—+((a, 0o)) is measurable for every real 
number a. 


600 18. Lebesgue measure 


Note that Lemma 18.5.8 ensures that the notion of measura,. 
bility for functions taking values in the extended reals R* is com- 
patible with that for functions taking values in just the reals R. 

Measurability behaves well with respect to limits: 


Lemma 18.5.10 (Limits of measurable functions are measur- 
able). Let Q be a measurable subset of R”. For each positive 
integer n, let fn : Q — R* be a measurable function. Then the 
functions supy>; fn, infn>1 fn, limsup,_,.. fn, and lim infn—=oo fn 
are also measurable. In particular, if the fn converge pointwise to 
another function f : Q — R, then f is also measurable. 


Proof. We first prove the claim about sup,>, fn. Call this func- 
tion g. We have to prove that g7!((a, 00)) is measurable for every 
a. But by the definition of supremum, we have 


1((a,00)) = LJ fa*((a, 00)) 


n>1 


(why?), and the claim follows since the countable union of mea- 
surable sets is again measurable. 

A similar argument works for infn>1 fn. The claim for lim sup 
and lim inf then follow from the identities 


lim sup fn = Ae aay Ji 


n00 
and 
lim ant fn = sup inf fn 
ae N>1"2N 
(see Definition 6.4.6). o 


As you can see, just about anything one does to a measurable 
function will produce another measurable function. This is basi- 
cally why almost every function one deals with in mathematics is 
measurable. (Indeed, the only way to construct non-measurable 
functions is via artificial means such as invoking the axiom of 
choice.) 


18.5. Measurable functions 601 


Exercise 18.5.1. Prove Lemma 18.5.3. (Hint: use Lemma 18.4.10 and 
the o-algebra property.) 


Exercise 18.5.2. Use Lemma 18.5.3 to deduce Corollary 18.5.4. 
Exercise 18.5.3. Prove Lemma 18.5.5. 


Exercise 18.5.4. Prove Lemma 18.5.8. (Hint: use Lemma 18.5.3. As a 
preliminary step, you may need to show that if f—+((a, 0o)) is measurable 
for all a, then f—!([a,0o)) is also measurable for all a.) 


Exercise 18.5.5. Let f : R” — R be Lebesgue measurable, and let 
g : R” — R be a function which agrees with f outside of a set of 
measure zero, thus there exists a set A C R” of measure zero such that 
f(x) = g(x) for all z e R"\A. Show that g is also Lebesgue measurable. 
(Hint: use Exercise 18.4.10.) 


Chapter 19 


Lebesgue integration 


In Chapter 11, we approached the Riemann integral by first inte- 
grating a particularly simple class of functions, namely the piece- 
wise constant functions. Among other things, piecewise constant 
functions only attain a finite number of values (as opposed to most 
functions in real life, which can take an infinite number of values). 
Once one learns how to integrate piecewise constant functions, 
one can then integrate other Riemann integrable functions by a 
similar procedure. 

We shall use a similar philosophy to construct the Lebesgue 
integral. We shall begin by considering a special subclass of mea- 
surable functions - the stmple functions. Then we will show how 
to integrate simple functions, and then from there we will inte- 
grate all measurable functions (or at least the absolutely integrable 
ones). 


19.1 Simple functions 


Definition 19.1.1 (Simple functions). Let Q be a measurable 
subset of R”, and let f : Q — R be a measurable function. We 
say that f is a simple function if the image f(Q) is finite. In other 
words, there exists a finite number of real numbers cj, co,...,cNn 
such that for every x € Q, we have f(x) = c; for some 1 <j < N. 


Example 19.1.2. Let Q be a measurable subset of R”, and let E 
be a measurable subset of Q. We define the characteristic function 


19.1. Simple functions 603 


xE : Q — R by setting xg(zx) := 1 if  € E, and xg(z) := 0 if 
x ¢ E. (In some texts, yz is also written 1g, and is referred to as 
an indicator function). Then xg is a measurable function (why?), 
and is a simple function, because the image yz(Q2) is {0,1} (or 
{0} if E is empty, or {1}if B=). © 


We remark on three basic properties of simple functions: that 
they form a vector space, that they are linear combinations of 
characteristic functions, and that they approximate measurable 
functions. More precisely, we have the following three lemmas: 


Lemma 19.1.3. Let Q be a measurable subset of R”, and let 
f:Q— R andg:Q—R be simple functions. Then f +g is also 
a simple function. Also, for any scalar c E R, the function cf is 
also a simple function. 


Proof. See Exercise 19.1.1. o 


Lemma 19.1.4. Let Q be a measurable subset of R”, and let f : 
Q — R be a simple function. Then there exists a finite number of 
real numbers cı,...,cy, and a finite number of disjoint measurable 
sets Fy, E2,..., En in Q, such that f = DA CiXE;- 


Proof. See Exercise 19.1.2. 0O 


Lemma 19.1.5. Let Q be a measurable subset of R”, and let 
f : Q — R be a measurable function. Suppose that f is always 
non-negative, i.e., f(x) > 0 for alla € Q. Then there exists a 
sequence fı, fo, f3,-.. of simple functions, fn : Q — R, such that 
the fn are non-negative and increasing, 


0 < fila) < fola) < fa(z) <... forallz EQ 
and converge pointwise to f: 
lim falx) = f(x) for alla EQ. 
Proof. See Exercise 19.1.3. O 


We now show how to compute the integral of simple functions: 


604 19. Lebesgue integration 


Definition 19.1.6 (Lebesgue integral of simple functions). Let 
Q be a measurable subset of R”, and let f : Q — R be a simple 
function which is non-negative; thus f is measurable and the image 
f(Q) is finite and contained in [0, co). We then define the Lebesgue 
integral fa f of f on Q by 


[i = ` Am({x E Q: f(x) =A}. 


AE f (Q);A>0 


We will also sometimes write fẹ f as fa f dm (to emphasize 
the rôle of Lebesgue measure m) or use a dummy variable such as 


T, €g., fo f(x) dz. 


Example 19.1.7. Let f : R — R be the function which equals 
3 on the interval [1,2], equals 4 on the integral (2,4), and is zero 
everywhere else. Then 


[f= 3x m(d1,2)) +4 x m((2,4)) = 3 x 1+4x2=11. 
Q 


Or if g : R — R is the function which equals 1 on [0, co) and is 
zero everywhere else, then 


J.s = 1x milo, o0) = 1x +00 = +00 
Q 


Thus the simple integral of a simple function can equal +00. (The 
reason why we restrict this integral to non-negative functions is 
to avoid ever encountering the indefinite form +00 + (—oo)). 


Remark 19.1.8. Note that this definition of integral corresponds 
to one’s intuitive notion of integration (at least of non-negative 
functions) as the area under the graph of the function (or volume, 
if one is in higher dimensions). 


Another formulation of the integral for non-negative simple 
functions is as follows. 


Lemma 19.1.9. Let Q be a measurable subset of R”, and let 
E\,...,EnNn are a finite number of disjoint measurable subsets in Q. 


19.1. Simple functions 605 


Let ci,...,cn be non-negative numbers (not necessarily distinct). 


Then we have 
N n 
f X cxe; = X cjm(E;). 
0 j= j=1 


Proof. We can assume that none of the c; are zero, since we can 
just remove them from the sum on both sides of the equation. 
Let f := YS CjXE;. Then f(x) is either equal to one of the cj 
(if x € Ej) or equal to 0 (if x ¢ S E;). Thus f is a simple 
function, and f(Q) C {0} U{c; : 1 < 7 < N}. Thus, by the 
definition, 


[t= ` Am({z EQ: f(z) =A} 


AcE{cj:1<j<N} 
= X Am U &). 
AE{cj:1<j<N} 1<j<N:cj=A 
But by the finite additivity property of Lebesgue measure, this is 


equal to 
YS dA X mE) 


AE{c7:1<j5<N} 1595 N:cj=A 
>, m(E;). 
NE{cj:1<j<N} 1<jSN:cj=À 


Each 7 appears exactly once in this sum, since c; is only equal 
to exactly one value of A. So the above expression is equal to 


Some basic properties of Lebesgue integration of non-negative 
simple functions: 


Proposition 19.1.10. Let Q be a measurable set, and let f : Q — 
R and g : Q — R be non-negative simple functions. 


(a) We have O < fa f < œ. Furthermore, we have fo f = 0 if 
and only if m({x E Q: f(x) #0}) =0. | 


606 19. Lebesgue integration 


(b) We have falf +9) = fo f + fog- 


(c) For any positive number c, we have fy cf =c fof. 


(d) If f(z) < g(x) for alla E Q, then we have fo f < fog. 


We make a very convenient notational convention: if a prop- 
erty P(x) holds for all points in Q, except for a set of measure 
zero, then we say that P holds for almost every point in Q. Thus 
(a) asserts that fẹ f = 0 if and only if f is zero for almost every 
point in 2. 


Proof. From Lemma 19.1.4 or from the formula 


f= 5 AX {xEN:f(z)=d} 
AEF(Q)\{O} 


we can write f as a combination of characteristic functions, say 


N 
f= ` CjiXEjs 
j=l 


where F,..., Eyn are disjoint subsets of Q and the c; are positive. 
Similarly we can write 


M 
9=)>  dkxR, 
k=1 


where F),..., Fm are disjoint subsets of Q and the dkg are positive. 


(a) Since fo f = y c;m(E;) it is clear that the integral is 
between 0 and infinity. If f is zero almost everywhere, then 
all of the E; must have measure zero (why?) and so fo f = 0. 
Conversely, if fa f = 0, then ee c;m(E;) = 0, which can 
only happen when all of the m(E;) are zero (since all the 
c; are positive). But then en E; has measure zero, and 
hence f is zero almost everywhere in Q. 


19.1. Simple functions 607 


(b) Write Eo := 2\ DS E; and co := 0, then we have Q = 
Eo UB, U...U Eyn and 


N 
f = ` CjXE;- 
j=0 


Similarly if we write Fọ := Q\ es Fi, and dp := 0 then 


M 
g = ` dkX Fy- 
k=0 


Since Q = Fo U... U En = Fh U... U Fy, we have 


N M 
f=) Y 0 oxenr, 


j=0 k=0 
and 
M N 
9= >) Y dexe,n, 
k=0 7=0 
and hence 
f+g= S (Gj + de) XBjnF- 
0<j<Nj0<k<M 


By Lemma 19.1.9, we thus have 
J (f +9) = XO (ej + da) m(E; 2 Fy). 
_ 0<j<N;0<Sk<M 
On the other hand, we have 
f j= ` cjm(E;) = ` cjm(E; N Fk) 
Q  O<j<Nn 0<j<Nj0<k<M 
and similarly 
f g= ` dgem( Fp) = >D dgm(Ej N Fk) 
Q 0<k<M 0<j<N;0<k<M 


and the claim (b) follows. 


608 19. Lebesgue integration 


(c) Since cf = > cCjXE;, we have focf = yan ccjm(E;). 
Since Jo Ts D c;m(E;), the claim follows. 


(d) Write h := g — f. Then h is simple and non-negative and 
g = f +h, hence by (b) we have fog = Jo f + fah. But by 
(a) we have foh > 0, and the claim follows. 


0 


Exercise 19.1.1. Prove Lemma 19.1.3. 
Exercise 19.1.2. Prove Lemma 19.1.4. 
Exercise 19.1.3. Prove Lemma 19.1.5. (Hint: set 
Joug j ; x 
fr(x) = SUP { on :J € Z, Dn < min( f(x), 2 )}, 


i.e., fnr(x) is the greatest integer multiple of 27” which does not exceed 
either f(x) or 2”. You may wish to draw a picture to see how fi, f2, f3, 
etc. works. Then prove that fn obeys all the required properties.) 


19.2 Integration of non-negative measurable func- 
tions 


We now pass from the integration of non-negative simple functions 
to the integration of non-negative measurable functions. We will 
allow our measurable functions to take the value of +00 some- 
times. 


Definition 19.2.1 (Majorization). Let f:Q — Randg:Q—-R 
be functions. We say that f majorizes g, or g minorizes f, iff we 
have f(x) > g(x) for all z € Q. 


We sometimes use the phrase “f dominates g” instead of “f 
majorizes g”. 


Definition 19.2.2 (Lebesgue integral for non-negative functions). 
Let Q be a measurable subset of R”, and let f : Q — [0,00] 


19.2. Integration of non-negative measurable functions 609 


be measurable and non-negative. Then we define the Lebesgue 
integral fo f of f on Q to be 


| f := sup{ J s : s is simple and non-negative, and minorizes f} 
Q Q 


Remark 19.2.3. The reader should compare this notion to that 
of a lower Riemann integral from Definition 11.3.2. Interestingly, 
we will not need to match this lower integral with an upper integral 
here. 


Remark 19.2.4. Note that if 2’ is any measurable subset of 
Q, then we can define fy f as well by restricting f to 9’, thus 


Jæ F = Jæ Fle- 


We have to check that this definition is consistent with our 
previous notion of Lebesgue integral for non-negative simple func- 
tions; in other words, if f : Q — R is a non-negative simple 
function, then the value of fo f given by this definition should 
be the same as the one given in the previous definition. But this 
is clear because f certainly minorizes itself, and any other non- 
negative simple function s which minorizes f will have an integral 
fo s less than or equal to fo f, thanks to Proposition 19.1.10(d). 


Remark 19.2.5. Note that fo f is always at least 0, since 0 is 
simple, non-negative, and minorizes f. Of course, to f could equal 
+00. 


Some basic properties of the Lebesgue integral on non-negative 
measurable functions (which supercede Proposition 19.1.10): 


Proposition 19.2.6. Let Q be a measurable set, and let f : Q — 
(0, co] and g : Q — [0, 00] be non-negative measurable functions. 


(a) We have 0 < fa f < co. Furthermore, we have fo f = 0 if 
and only if f(x) = 0 for almost every z E€ Q. 


(b) For any positive number c, we have focf =c fof. 


(c) If f(z) < g(x) for all z E€ Q, then we have fof < fog. 


610 19. Lebesgue integration 


(d) If f(x) = g(x) for almost every z € Q, then fo f = fog. 
(e) If 2’ CQ is measurable, then fo f= fo fxa < Sof. 
Proof. See Exercise 19.2.1. o 


Remark 19.2.7. Proposition 19.2.6(d) is quite interesting; it says 
that one can modify the values of a function on any measure zero 
set (e.g., you can modify a function on every rational number), 
and not affect its integral at all. It is as if no individual point, 
or even a measure zero collection of points, has any “vote” in 
what the integral of a function should be; only the collective set 
of points has an influence on an integral. 


Remark 19.2.8. Note that we do not yet try to interchange sums 
and integrals. From the definition it is fairly easy to prove that 
fof +9) = faf + Jag (Exercise 19.2.2), but to prove equality 
requires more work and will be done later. 


As we have seen in previous chapters, we cannot always inter- 
change an integral with a limit (or with limit-like concepts such 
as supremum). However, with the Lebesgue integral it is possible 
to do so if the functions are increasing: 


Theorem 19.2.9 (Lebesgue monotone convergence theorem). Let 
Q be a measurable subset of R”, and let (fn); be a sequence of 
non-negative measurable functions from Q to R. which are increas- 
ing in the sense that 


0 < fi(z) < falx) < fa(x) <... for alla EN. 


(Note we are assuming that fn(x) is increasing with respect to n; 
this is a different notion from fn(x) increasing with respect to x.) 


Then we have 
os| Asf hsf ns... 
Q Q Q 


T p 
Q n n JQ 


and 


19.2. Integration of non-negative measurable functions 611 


Proof. The first conclusion is clear from Proposition 19.2.6(c). 
Now we prove the second conclusion. From Proposition 19.2.6(c) 


again we have 
[sup tn> | fr 
Q m Q 


for every n; taking suprema in n we obtain 


[ sup fm > sup | fa 
Qm n JQ 
which is one half of the desired conclusion. To finish the proof we 
have to show 
f sup fm < sup | fn- 
Qm n JQ 


From the definition of fa sup fm, it will suffice to show that 


[s<su | fa 
Q n JQ 


for all simple non-negative functions which minorize sup,, fm. 
Fix s. We will show that 


ae) | s<sup/ fa 


for every 0 < € < 1; the claim then follows by taking limits as 
e— 0. 
Fix £. By construction of s, we have 


s(x) < sup fn(£) 


for every x € Q. Hence, for every x € Q there exists an N (de- 
pending on x) such that 


fn(x) 2 (1 — €)s(2). 


Since the fn are increasing, this will imply that f,(x) > (1—e)s(x) 
for all n > N. Thus, if we define the sets En by 


En := {x£ E Q : fn(x) > (1 — e)s(x)} 


612 19. Lebesgue integration, 


then we have FE, C Fy C E3 Cc... and UP, En =Q.. 
From Proposition 19.2.6( (cdf) we have 


a-e) f s= | a-9s< f fn S fh 


so to finish the argument it will suffice to show that 


sup | s= 
n JEn Q 


Since s is a simple function, we may write s = Ae CjXF; for 
some measurable F} and positive cj. Since 


N 
J s= X cjm F. 
and 
N N 
J s= J X Cj XF;NEn = X cjm(F; N En) 
n En j=1 j=1 


it thus suffices to show that 


sup m(F; N En) = m(F;) 
n 


for each j. But this follows from Exercise 18.2.3(a). 0 


This theorem is extremely useful. For instance, we can now 
interchange addition and integration: 


Lemma 19.2.10 (Interchange of addition and integration). Let 
Q be a measurable subset of R”, and let f : Q — [0,00] and g : 
Q — [0, co] be measurable functions. Then {,(f+9) = fo f+ Jog- 


Proof. By Lemma 19.1.5, there exists a sequence 0 < sı < so < 

. < f of simple functions such that sup, Sn = f, and similarly 
a. sequence 0 < tı < t2 < ... < g of simple functions such that 
sup, tn = g. Since the s, are increasing and the tp are increas- 
ing, it is then easy to check that Sn + tn is also increasing and 


19.2. Integration of non-negative measurable functions 613 


SUPn(Sn + tn) = f +g (why?). By the monotone convergence 
theorem (Theorem 19.2.9) we thus have 


f t=s | sn 
Q n JQ 
| o=sup | tn 
Q n JQ 


J E+) = sup | (sn + tn) 


But by Proposition 19.1.9(b) we have fo(Sn +tn) = fo Sn + fo tn- 
By Proposition 19.1.9(d), fo Sn and fo tn are both increasing in 


n, SO 
sup( | int | tn) = (sup | Sn) + (sup | tn) 
n Q Q n JQ n JQ 


and the claim follows. O 


Of course, once one can interchange an integral with a sum of 
two functions, one can handle an integral and any finite number of 
functions by induction. More surprisingly, one can handle infinite 
sums as well of non-negative functions: 


Corollary 19.2.11. If Q is a measurable subset of R”, and gi, 


g2,... are a sequence of non-negative functions from 2 to [0, oc], 
then 
OO (© 9) 
[dom = fhe 
Q n=1 n=1 Q 
Proof. See Exercise 19.2.3. O 


Remark 19.2.12. Note that we do not need to assume anything 
about the convergence of the above sums; it may well happen 
that both sides are equal to +00. However, we do need to assume 
non-negativity; see Exercise 19.2.4. 


One could similarly ask whether we could interchange limits 
and integrals; in other words, is it true that 


J lim fn = lim | te 
Q nce n—co Q 


614 19. Lebesgue integration 


Unfortunately, this is not true, as the following “moving bump” 
example shows. For each n = 1,2,3..., let fn : R — R be the 
function fn = Xinn+1)- Then limp—oo Lye = 0 for every z, but 
SR fn = 1 for every n, and hence limp. [Rf =140. In 
other words, the limiting function limpn—oo fn can end up having 
significantly smaller integral than any of the original integrals, 
However, the following very useful lemma of Fatou shows that the 
reverse cannot happen - there is no way the limiting function has 
larger integral than the (limit of the) original integrals: 


Lemma 19.2.13 (Fatou’s lemma). Let Q be a measurable subset 
of R”, and let fi, fo,... be a sequence of non-negative functions 
from Q to [0,00]. Then 


J tmint fn < liminf f Tr: 
Q n00 nm— Co Q 
Proof. Recall that 
lim inf fn = sup( inf fm) 
n—0o n mon 
and hence by the monotone convergence theorem 
f iim inf fa=sup | (inf Jm): 
Q n—0o n Q m>n 
By Proposition 19.2.6(c) we have 


[int tm s E 


for every j > n; taking infima in j we obtain 


| int, fm) < int ff 


J lim inf fn < sup inf f f; = lim inf J fn 
Q n—oo n j2nJiqo raS JO 


as desired. 0 


Thus 


19.2. Integration of non-negative measurable functions 615 


Note that we are allowing our functions to take the value +00 
at some points. It is even possible for a function to take the 
value +oo but still have a finite integral; for instance, if E is a 
measure zero set, and f : Q — R is equal to +oo on E but 
equals 0 everywhere else, then fa f = 0 by Proposition 19.2.6(a). 
However, if the integral is finite, the function must be finite almost 
everywhere: 


Lemma 19.2.14. Let Q be a measurable subset of R”, and let 
f : Q — [0,00] be a non-negative measurable function such that 
fo f is finite. Then f is finite almost everywhere (i.e., the set 
{xz E Q: f(x) = +00} has measure zero). 


Proof. See Exercise 19.2.5. o 


Form Corollary 19.2.11 and Lemma 19.2.14 one has a useful 
lemma: 


Lemma 19.2.15 (Borel-Cantelli lemma). Let Q1, Q2,... be mea- 
surable subsets of R” such that }>°°., M(Nn) is finite. Then the 
set 

{x e R” : x E Qn for infinitely many n} 


is a set of measure zero. In other words, almost every point belongs 
to only finitely many Nn. 


Proof. See Exercise 19.2.6. o 


Exercise 19.2.1. Prove Proposition 19.2.6. (Hint: do not attempt to 
mimic the proof of Proposition 19.1.10; rather, try to use Proposition 
19.1.10 and Definition 19.2.2. For one direction of part (a), start with 
fa f = 0 and conclude that m({x € Q : f(x) > 1/n}) = 0 for every 
n = 1,2,3,..., and then use the countable sub-additivity. To prove (e), 
first prove it for simple functions.) 


Exercise 19.2.2. Let Q be a measurable subset of R”, and let f : Q — 
[0, +00] and g : Q — [0, +00] be measurable functions. Without using 
Theorem 19.2.9 or Lemma 19.2.10, prove that fo(f+9) > fo f+ Jog- 
Exercise 19.2.3. Prove Corollary 19.2.11. (Hint: use the monotone con- 
vergence theorem with fy := a gn.) 


616 19. Lebesgue integration 


Exercise 19.2.4. For each n = 1,2,3,..., let fn : R — R be the function 
fn = X[nn+1) — Xin+1,n+2); Le., let fn(x) equal +1 when z E [n,n + 1), 
equal —1 when x E [n+ 1,n + 2), and 0 everywhere else. Show that 


fumed ht 


Explain why this does not contradict Corollary 19.2.11. 
Exercise 19.2.5. Prove Lemma 19.2.14. 


Exercise 19.2.6. Use Corollary 19.2.11 and Lemma 19.2.14 to prove 
Lemma 19.2.15. (Hint: use the indicator functions xq, .) 


Exercise 19.2.7. Let p > 2 and c > 0. Using the Borel-Cantelli lemma, 
show that the set 


{x € [0,1]: |z — A < = for infinitely many positive integers a, q} 


has measure zero. (Hint: one only has to consider those integers a in 
the range 0 < a < q (why?). Use Corollary 11.6.5 to show that the sum 
ae eta?) is finite.) 

Exercise 19.2.8. Call a real number z E R diophantine if there exist real 
numbers p,C > 0 such that |x — 2| > C’/|q|? for all non-zero integers q 
and all integers a. Using Exercise 19.2.7, show that almost every real 
number is diophantine. (Hint: first work in the interval [0, 1]. Show that 
one can take p and C to be rational and one can also take p > 2. Then 
use the fact that the countable union of measure zero sets has measure 
Zero.) 


Exercise 19.2.9. For every positive integer n, let fn : R — [0, c0) bea 
non-negative measurable function such that 


1 
< —. 
hts 4n 


Show that for every € > 0, there exists a set E of Lebesgue measure 
m(E) < e€ such that f,(x) converges pointwise to zero for all z € R\E. 
(Hint: first prove that m({z € R : fn(x) > sx}) < & for all n = 
1,2,3,..., and then consider the union of all the sets {x € R: f,(x) > 


1 
zor }-) 
Exercise 19.2.10. For every positive integer n, let fn : R — (0,00) be 
a non-negative measurable function such that f, converges pointwise 


19.3. Integration of absolutely integrable functions 617 


to zero. Show that for every € > 0, there exists a set E of Lebesgue 
measure m(E) < e such that f,(x) converges uniformly to zero for all 
z € R\E. (This is a special case of Egoroff’s theorem. To prove it, first 
show that for any positive integer m, we can find an N > 0 such that 
m({z E€ R: f,(x) > 1/m}) < €/2™ for al n > N.) 

Exercise 19.2.11. Give an example of a bounded non-negative function 
f: NxN — Rt such that $%_, f(n, m) converges for every n, and 
such that limn—oo f(n, m) exists for every m, but such that 


Jm DS f(m) # S> lim f(n m). 
m=1 m=1 


(Hint: modify the moving bump example. It is even possible to use 
a function f which only takes the values 0 and 1.) This shows that 
interchanging limits and infinite sums can be dangerous. 


19.3 Integration of absolutely integrable functions 


We have now completed the theory of the Lebesgue integral for 
non-negative functions. Now we consider how to integrate func- 
tions which can be both positive and negative. However, we do 
wish to avoid the indefinite expression +00 + (—oo), so we will 
restrict our attention to a subclass of measurable functions - the 
absolutely integrable functions. 


Definition 19.3.1 (Absolutely integrable functions). Let Q bea 
measurable subset of R”. A measurable function f : Q — R* is 
said to be absolutely integrable if the integral fo |f| is finite. 


Of course, |f| is always non-negative, so this definition makes 
sense even if f changes sign. Absolutely integrable functions are 
also known as L! (Q) functions. 

If f : Q — R* is a function, we define the positive part ft : 
Q — [0,20] and negative part fT : Q — [0, co] by the formulae 


ft :=max(f,0); f7 := —min(f,0). 


From Corollary 18.5.6 we know that ft and fT are measurable. 
Observe also that ft and fT are non-negative, that f = ft — f7, 
and |f| = ft + f7. (Why?). 


618 19. Lebesgue integration 


Definition 19.3.2 (Lebesgue integral). Let f : Q — R* be an 
absolutely integrable function. We define the Lebesgue integral 
Ja f of f to be the quantity 


bre honk 


Note that since f is absolutely integrable, fo ft and fo f- 
are less than or equal to fọ |f| and hence are finite. Thus fo f is 
always finite; we are never encountering the indeterminate form 
+00 — (+00). 

Note that this definition is consistent with our previous def- 
inition of the Lebesgue integral for non-negative functions, since 
if f is non-negative then ft = f and fT = 0. We also have the 
useful triangle inequality 


fris fate fr fis (19.1) 


(Exercise 19.3.1). 
Some other properties of the Lebesgue integral: 


Proposition 19.3.3. Let Q be a measurable set, and let f : Q — 
R and g : Q — R. be absolutely integrable functions. 


(a) For any real number c (positive, zero, or negative), we have 
that cf is absolutely integrable and fo cf = c fo f- 


(b) The function f +g is absolutely integrable, and {,(f +9) = 
Jaf + Jas. 


(c) If f(x) < g(x) for all x E€ Q, then we have fof < fog. 
(d) If f(x) = g(x) for almost every z € Q, then fo f = fog- 
Proof. See Exercise 19.3.2. O 


As mentioned in the previous section, one cannot necessarily 
interchange limits and integrals, lim f fn = f lim fn, as the “mov- 
ing bump example” showed. However, it is possible to exclude the 


19.3. Integration of absolutely integrable functions 619 


moving bump example, and successfully interchange limits and in- 
tegrals, if we know that the functions fn are all majorized by a 
single absolutely integrable function. This important theorem is 
known as the Lebesgue dominated convergence theorem, and is ex- 
tremely useful: 


Theorem 19.3.4 (Lebesgue dominated convergence thm). Let 
Q be a measurable subset of R”, and let fi, fo,... be a sequence 
of measurable functions from Q to R* which converge pointwise. 
Suppose also that there is an absolutely integrable function F : 
Q — [0,00] such that |fn(x)| < F(x) for all £ E Q and all n = 


1,2,3,.... Then 
J lim fn = lim JJe 
Q roo n— O0 Q 


Proof. Let f : Q — R* be the function f(x) := limp—oo fn(x); this 
function exists by hypothesis. By Lemma 18.5.10, f is measurable. 
Also, since |fn(x)| < F(x) for all n and all x € Q, we see that 
each fn is absolutely integrable, and by taking limits we obtain 
|f(£)| < F(x) for all z € Q, so f is also absolutely integrable. Our 
task is to show that limn—oo fo fn = Jo f- 

The functions F' + fn are non-negative and converge pointwise 
to F + f. So by Fatou’s lemma (Lemma 19.2.13) 


f F+f<iim inf [E+ 
Q RIO SO 
and thus 
J f <lim inf f fn. 
Q WPI T 
But the functions F — fn are also non-negative and converge point- 
wise to F — f. So by Fatou’s lemma again 


f F-f<lim inf [E-ha 
Q RRO 


Since the right-hand side is fp F — lim suPp—oo Jo fn (why did the 
lim inf become a lim sup?), we thus have 


f 2 lim sup J Jii: 
Q Q 


n— OO 


620 19. Lebesgue integration 


Thus the lim inf and lim sup of fo fn are both equal to fẹ f, as 
desired. o 


Finally, we record a lemma which is not particularly interesting 
in itself, but will have some useful consequences later in these 
notes. 


Definition 19.3.5 (Upper and lower Lebesgue integral). Let Q be 
a measurable subset of R”, and let f : Q — R be a function (not 
necessarily measurable). We define the upper Lebesgue integral 


Ta f to be 
J f := inf{ J g : g is an absolutely integrable function 
Q Q 


from Q to R that majorizes f} 
and the lower Lebesgue integral Í; f to be 


J f := sup{ J g : g is an absolutely integrable function 
sQ Q 


from Q to R that minorizes f}. 


It is easy to see that Jo f< Ía f (why? use Proposition 
19.3.3(c)). When f is absolutely integrable then equality occurs 
(why?). The converse is also true: 


Lemma 19.3.6. Let Q be a measurable subset of R”, and let 
f: QR be a function (not necessarily measurable). Let A be a 
real number, and suppose f af = te f =A. Then f is absolutely 


integrable, and __ 
Q Q Q 


Proof. By definition of upper Lebesgue integral, for every integer 
n > 1 we may find an absolutely integrable function ft : Q — R 
which majorizes f such that 


[sari 


19.3. Integration of absolutely integrable functions 621 


Similarly we may find an absolutely integrable function fa: a> 
R which minorizes f such that 


f isa- 
Q n 


Let F* := inf, ft and F~ := sup, f,. Then Ft and F” are mea- 
surable (by Lemma 18.5.10) and absolutely integrable (because 
they are squeezed between the absolutely integrable functions fi 
and fi , for instance). Also, Ft majorizes f and F7 minorizes f. 


Finally, we have 
[ets | fisat 
Q Q n 


for every n, and hence 


Similarly we have 
F > A. 
Q 
but F+ majorizes F~, and hence fa F+ > fo F7. Hence we must 


have 
| r= =A 
Q Q 


[rt-r =o 
Q 


By Proposition 19.2.6(a), we thus have Ft (x) = F- (x) for almost 
every x. But since f is squeezed between F- and F+, we thus 
have f(x) = Ft(x) = F- (x) for almost every x. In particular, f 
differs from the absolutely integrable function F* only on a set 
of measure zero and is thus measurable (see Exercise 18.5.5) and 
absolutely integrable, with 


a ad ae 


as desired. O 


In particular 


622 19. Lebesgue integration 


Exercise 19.3.1. Prove (19.1) whenever Q is a measurable subset of R” 
and f is an absolutely integrable function. 

Exercise 19.3.2. Prove Proposition 19.3.3. (Hint: for (b), break f, g, and 
f +g up into positive and negative parts, and try to write everything in 
terms of integrals of non-negative functions only, using Lemma 19.2.10.) 
Ezercise 19.3.3. Let f : R — Rand g : R — R be absolutely integrable, 
measurable functions such that f(x) < g(x) for all x € R, and that 
Jfr f= frg. Show that f(x) = g(x) for almost every z € R (i.e., that 
f(x) = g(x) for all z € R except possibly for a set of measure zero). 


19.4 Comparison with the Riemann integral 


We have spent a lot of effort constructing the Lebesgue integral, 
but have not yet addressed the question of how to actually com- 
pute any Lebesgue integrals, and whether Lebesgue integration 
is any different from the Riemann integral (say for integrals in 
one dimension). Now we show that the Lebesgue integral is a 
generalization of the Riemann integral. To clarify the following 
discussion, we shall temporarily distinguish the Riemann integral 
from the Lebesgue integral by writing the Riemann integral f, f 
as R. f; f. 
Our objective here is to prove 


Proposition 19.4.1. Let I C R be an interval, and let f : I — 
R be a Riemann integrable function. Then f is also absolutely 
integrable, and fi f = R. fi f- 


Proof. Write A := R. f rf. Since f is Riemann integrable, we 
know that the upper and lower Riemann integrals are equal to A. 
Thus, for every € > 0, there exists a partition P of J into smaller 
intervals J such that 


A—eE< j <A< < 
e< $ lJlinf f(z) <As< DY [Jisup f(z) <A+e, 
JeP JeP 


where |J| denotes the length of J. Note that |J| is the same as 
m(J), since J is a box. 


19.4. Comparison with the Riemann integral 623 


Let fz : I — R and ft : I — R. be the functions 


fe (a) = ) inf f(z)xs(a) 


JeP 


and 
fè (2) = > sup f(2)xs(@); 
JeP* 


these are simple functions and hence measurable and absolutely 
integrable. By Lemma 19.1.9 we have 


[% = Migs Fe 


J i= E Wisto) 
I JeP tEJ 
and hence 
A-e< | fz SAS | fi <Ate 


Since f+ majorizes f, and fy minorizes f, we thus have 


A-e< f psf seate 


for every £, and thus 


Ja 


and hence by Lemma 19.3.6, f is absolutely integrable with f i= 
A, as desired. o 


Thus every Riemann integrable function is also Lebesgue in- 
tegrable, at least on bounded intervals, and we no longer need 
the R. f r f notation. However, the converse is not true. Take for 


624 19. Lebesgue integration 


instance the function f : [0,1] — R defined by f(x) := 1 when 
x is rational, and f(z) := 0 when v is irrational. Then from 
Proposition 11.7.1 we know that f is not Riemann integrable. 
On the other hand, f is the characteristic function of the set 
Q N [0,1], which is countable ~ hence measure zero. Thus f 
is Lebesgue integrable and J 0,1} f = 0. Thus the Lebesgue inte- 
gral can handle more O han the Riemann integral; this is 
one of the primary reasons why we use the Lebesgue integral in 
analysis. (The other reason is that the Lebesgue integral interacts 
well with limits, as the Lebesgue monotone convergence theorem, 
Fatou’s lemma, and Lebesgue dominated convergence theorem al- 
ready attest. There are no comparable theorems for the Riemann 
integral). 


19.5 Fubini’s theorem 


In one dimension we have shown that the Lebesgue integral is 
connected to the Riemann integral. Now we will try to understand 
the connection in higher dimensions. To simplify the discussion we 
shall just study two-dimensional integrals, although the arguments 
we present here can easily be extended to higher dimensions. 

We shall study integrals of the form fp2 f. Note that once 


we know how to integrate on Rĉ, we can integrate on measurable 
subsets 2 of R?, since fo f can be rewritten as SR? fxa- 

Let f(x,y) be a function of two variables. In principle, we 
have three different ways to integrate f on R?. First of all, we 
can use the two-dimensional Lebesgue integral, to obtain SR? ie 
Secondly, we can fix x and compute a one-dimensional integral in 
y, and then take that quantity and integrate in x, thus obtaining 
SRUR f(z, y) dy) dx. Secondly, we could fix y and integrate in 
z, and then integrate in y, thus obtaining [R(fR f(x,y) dx) dy. 

Fortunately, if the function f is absolutely integrable on f, 
then all three integrals are equal: 


Theorem 19.5.1 (Fubini’s theorem). Let f : R? — R be an ab- 
solutely integrable function. Then there exists absolutely integrable 


19.5. Fubini’s theorem 625 


functions F : R — R andG:R—R such that for almost every 
x, f(x,y) is absolutely integrable in y with 


F(z) = [few dy, 


and for almost every y, f(x,y) is absolutely integrable in x with 


G(y) = f f(x,y) dz. 


Finally, we have 


F(a) dz = | f= | CO) dy. 
J Fe d= f t= f cw) ay 
Remark 19.5.2. Very roughly speaking, Fubini’s theorem says 


that 
[Uf few) ay) ae = f t= f f teu) da) dy. 


This allows us to compute two-dimensional integrals by splitting 
them into two one-dimensional integrals. The reason why we do 
not write Fubini’s theorem this way, though, is that it is possible 
that the integral Ir f(x,y) dy does not actually exist for every 
x, and similarly Jr f(x,y) dx does not exist for every y; Fubini’s 
theorem only asserts that these integrals only exist for almost 
every x and y. For instance, if f(x,y) is the function which equals 
1 when x > 0 and y = 0, equals —1 when z < 0 and y = 0, 
and is zero otherwise, then f is absolutely integrable on R? and 
SR? f = 0 (since f equals zero almost everywhere in R°), but 
fr f(z, y) dy is not absolutely integrable when x = 0 (though it 
is absolutely integrable for every other z). 


Proof. The proof of Fubini’s theorem is quite complicated and we 
will only give a sketch here. We begin with a series of reductions. 

Roughly speaking (ignoring issues relating to sets of measure 
zero), we have to show that | 


[Uf few dy) d= ft 


626 19. Lebesgue integration 


together with a similar equality with z and y reversed. We shall 
just prove the above equality, as the other one is very similar. 

First of all, it suffices to prove the theorem for non-negative 
functions, since the general case then follows by writing a general 
function f as a difference ft — fT of two non-negative functions, 
and applying Fubini’s theorem to ft and f~ separately (and using 
Proposition 19.3.3(a) and (b)). Thus we will henceforth assume 
that f is non-negative. 

Next, it suffices to prove the theorem for non-negative func- 
tions f supported on a bounded set such as [—N, N] x [-N, N] 
for some positive integer N. Indeed, once one obtains Fubini’s 
theorem for such functions, one can then write a general function 
f as the supremum of such compactly supported functions as 


f = sup fX|-N,N})x[-N,N]» 
N>0 


apply Fubini’s theorem to each function fX{-N,N]x[-N,N] Sepa- 
rately, and then take suprema using the monotone convergence 
theorem. Thus we will henceforth assume that f is supported on 
[-N, N] x [-N, N]. 

By another similar argument, it suffices to prove the theorem 
for non-negative simple functions supported on [-N, N] x[-N, N], 
since one can use Lemma 19.1.4 to write f as the supremum 
of simple functions (which must also be supported on [—N, N]), 
apply Fubini’s theorem to each simple function, and then take 
suprema using the monotone convergence theorem. Thus we may 
assume that f is a non-negative simple function supported on 
[-N, N] x [-N, N]. 

Next, we see that it suffices to prove the theorem for charac- 
teristic functions supported in [-N, N] x [-N, N]. This is because 
every simple function is a linear combination of characteristic func- 
tions, and so we can deduce Fubini’s theorem for simple functions 
from Fubini’s theorem for characteristic functions. Thus we may 
take f = xg for some measurable E C [-N, N] x [-N, N]. Our 
task is then to show (ignoring sets of measure zero) that 


J ( f xB (2,y) dy) de = m(E). 
[—N,N] J[-N,N] 


19.5. Fubini’s theorem 627 


It will suffice to show the upper Lebesgue integral estimate 


J J xe(z,y) dy) de < m(B). (19.2) 
[-N,N] [—N,N] 


We will prove this estimate later. Once we show this for every set 
E, we may substitute E with [—N, N] x [—N,N]\F and obtain 


J qi (1- xe(z,y)) dy) dz < 4N? — m(E). 
(—N,N] [-N,N] 


But the left-hand side is equal to 


J (2N — I xve(z,¥) dy) dz 
[—N,N] “(—N,N] 


which is in turn equal to 


an2- f (f xele) dy) a 
J {-N,N] LEN 


and thus we have 


J J xn(2,y) dy) de > m(B). 
[—N,N] —[-N,N] 


In particular we have 


J J xn(0,y) dy) de > m(E) 
[-N,N] ¥ [-N,N] 


and hence by Lemma 19.3.6 we see that f- N, NJXE(T, y) dy is 
absolutely integrable and 


J J xe(£, y) dy) de = m(E). 
[-N,N] [-N,N] 


A similar argument shows that 


J J xE(T, y) dy) de = m(E) 
[-N,N] 3—{-N,N] 


628 19. Lebesgue integration 


and hence 


| a xa(tt,y) dy — J xe(£,y)) de = 0. 
[—N,N] ¥ [-N,N] Y_[-N,N] 


Thus by Proposition 19.2.6(a) we have 


J xe(2,y) dy = J aeui 
/ [-N,N] [-N,N] 


for almost every x € [—N,N]. Thus yz(z,y) is absolutely inte- 
grable in y for almost every x, and fi- N,N] XE(®, y) is thus equal 
(almost everywhere) to a function F(x) such that 


f F(x) dz = m(E) 
[-N,N] 


as desired. 

It remains to prove the bound (19.2). Let € > 0 be arbitrary. 
Since m(E) is the same as the outer measure m*(E), we know 
that there exists an at most countable collection (B;)j¢7 of boxes 
such that E C U;ey Bj and 


X m(B;) < m(E) +e. 
jeJ 


Each box Bj can be written as Bj = I; x I; for some intervals I; 
and I;. Observe that 


m(B;) = |GIIGI= | id= f (f, dujas 
Ij I; I; 


J 


=f (| xan (eu)dzidy= (f xe; (cu) doi 
[—N,N] [—N,N] [—N,N] [—N,N] 
Adding this over all 7 € J (using Corollary 19.2.11) we obtain 


DBD = fy Zox) dy 


jeJ [ jeJ 


19.5. Fubini’s theorem 629 


In particular we have 


Fan coe DXB; (x,y) da)dy < m(E) +e. 


jEJ 
But jeJ XB; majorizes xg (why?) and thus 
J ( J xE(2, y) dx)dy < m(E) + €. 
[-N,N] [-N,N] 


But £ is arbitrary, and so we have (19.2) as desired. This completes 
the proof of Fubini’s theorem. O 


Index 


++ (increment), 18, 56 
on integers, 87 
+C, 342 
a-length, 334 
e-adherent, 161, 244 
contually e-adherent, 161 
€-close 
eventual, 115 
functions, 253 
local, 253 
rationals, 99 
reals, 146 
sequences, 115, 147 
e-steady, 110, 146 
eventually e-steady, 111, 
146 
mw, 506, 508 
o-algebra, 576, 595 


a posteriori, 20 
a priori, 20 
Abel’s theorem, 484 
absolute convergence 
for series, 192, 220 
test, 192 
absolute value 
for complex numbers, 498 
for rationals, 98 
for reals, 129 


absolutely integrable, 617 
absorption laws, 52 
abstraction, 24-25, 390 
addition 
long, 385 
of cardinals, 82 
of functions, 252 
of complex numbers, 496 
of integers, 86 
of natural numbers, 27 
of rationals, 93 
of reals, 119 
(countably) additive measure, 
576, 592 
adherent point 
infinite, 286 
of sequences: see limit poin 
of sequences 
of sets: 245, 402, 435 
alternating series test, 193 
ambient space, 406 
analysis, 1 
and: see conjunction 
antiderivative, 340 
approximation to the identity, 
466, 469, 523 
Archimedian property, 132 
arctangent: see trigonomet- 
ric functions 


II INDEX 


Aristotlean logic, 373-374 
associativity 


sets, 41 
of specification, 45 


of addition in C, 496 

of addition in N, 29 

of composition, 59-60 

of multiplication in N, 34 

of scalar multiplication, 538 

of vector addition, 534 

see also: ring, field, laws 
of algebra 


asymptotic discontinuity, 268 
Axiom(s) 


in mathematics, 24-25 

of choice, 40, 73, 229 

of comprehension: see Ax- 
iom of universal spec- 
ification 

of countable choice, 230 

of equality, 377 

of foundation: see Axiom 
of regularity 

of induction: see princi- 
ple of mathematical 
induction 

of infinity, 50 

of natural numbers: see 
Peano axioms 

of pairwise union, 42 

of power set, 66 

of reflexivity, 377 

of regularity, 54 

of replacement, 49 

of separation, 45 


of set theory, 38,40-42,45 ,49- 


50,54,66 
of singleton sets and pair 


of symmetry, 377 

of substitution, 57, 377 

of the empty set, 40 

of transitivity, 377 

of universal specification, 
52 

of union, 67 


ball, 400 
Banach-Tarski paradox, 575, 
590 
base of the natural logarithm: 
see e 
basis 
standard basis of row vec- 
tors, 535 
bijection, 62 
binomial formula, 189 
Bolzano- Weierstrass theorem, 
174 
Boolean algebra, 47, 576, 591 
Boolean logic, 367 
Borel property, 575, 596 
Borel-Cantelli lemma, 615 
bound variable, 180, 369, 376 
boundary (point), 401, 435 
bounded 
from above and below, 269 
function, 269, 451 
interval, 244 
sequence, 113, 150 
sequence away from zero, 
123, 127 
set, 248, 413 


INDEX 


C, C°, C!, C?, C*, 556 
cancellation law 
of addition in N, 29 
of multiplication in N, 35 
of multiplication in Z, 91 
of multiplication in R, 126 
Cantor’s theorem, 224 
cardinality 
arithmetic of, 81 
of finite sets, 80 
uniqueness of, 80 
Cartesian product, 70-71 
infinite, 228-229 
Cauchy criterion, 197 
Cauchy sequence, 111, 146, 409 
Cauchy-Schwarz inequality, 399, 
516 
chain: see totally ordered set 
chain rule, 293 
in higher dimensions, 552, 
555 
change of variables formula, 
346-358 
character, 518 
characteristic function, 602 
choice 
single, 40 
finite, 73 
countable, 230 
arbitrary, 229 
closed 
box, 580 
interval, 243 
set, 403, 435 
Clairaut’s theorem: see inter- 
changing derivatives 


II 


with derivatives 
closure, 245, 403, 435 
cluster point: see limit point 
cocountable topology, 438 
coefficient, 476 
cofinite topology, 437 
column vector, 534 
common refinement, 311 
commutativity 
of addition in C, 496 
of addition in N, 29 
of addition in vector spaces, 
534 
of convolution, 467, 522 
of multiplication in N, 34 
see also: ring, field, laws 
of algebra 
compactness, 413, 436 
compact support, 465 
comparison principle (or test) 
for finite series, 181 
for infinite series, 196 
for sequences, 166 
completeness 
of the space of continu- 
ous functions, 454 
of metric spaces, 410 
of the reals, 168 
completion of a metric space, 
412 
complex numbers C, 495 
complex conjugation, 498 
composition of functions, 59 
conjunction (and), 354 
connectedness, 307, 430 
connected component, 433 


IV 


constant 
function, 58, 312 
sequence, 170 
continuity, 261, 420, 436 
and compactness, 427 
and connectedness, 431 
and convergence, 255, 421 
continuum, 242 
hypothesis, 227 
contraction, 558 
mapping theorem, 559 
contrapositive, 362 
convergence 
in L?, 517 
of a function at a point, 
254, 441 
of sequences, 148, 394, 434 
of series, 190 
pointwise: see pointwise 
convergence 
uniform: see uniform con- 
vergence 
converse, 362 
convolution, 466, 487, 522 
corollary, 28 
coset, 587 
cosine: see trigonometric func- 
tions 
cotangent: see trigonometric 
functions 
countability, 208 
of the integers, 212 
of the rationals, 214 
cover, 578 
see also: open cover 
critical point, 572 


INDEX 


de Moivre identities, 507 
de Morgan laws, 47 
decimal 
negative integer, 384 
non-uniqueness of repre- 
sentation, 388 
point, 385 
positive integer, 384 
real, 385-386 
degree, 464 
denumerable: see countable 
dense, 465 
derivative, 288 
directional, 544 
in higher dimensions, 542 
544, 546, 551 
partial, 546 
matrix, 551 
total, 542, 544 
uniqueness of, 542 
difference rule, 293 
difference set, 47 
differential matrix: see deriv- 
ative matrix 
differentiability 
at a point, 288 
continuous, 556 
directional, 544 
in higher dimensions, 542 
infinite, 478 
k-fold, 478, 556 
digit, 381 
dilation, 536 
diophantine, 616 
Dirac delta function, 466 
direct sum 


? 


INDEX 


of functions, 75, 425 
discontinuity: see singularity 
discrete metric, 393 
disjoint sets, 47 
disjunction (or), 354 

inclusive vs. exclusive, 354 
distance 

in C, 499 

in Q, 98 

in R, 145, 391 
distributive law 

for natural numbers, 34 

for complex numbers, 497 

see also: laws of algebra 
divergence 

of series, 3, 190 

of sequences, 4 

see also: convergence 
divisibility, 237 
division 

by zero, 3 

formal (//), 93 

of functions, 252 

of rationals, 96 
domain, 55 
dominate: see majorize 

dominated convergence: see 

Lebesgue dominated 
convergence theorem 
doubly infinite, 244 
dummy variable: see bound 
variable 


e, 491 
Egoroff’s theorem, 617 
empty 

Cartesian product, 73 


function, 58 
sequence, 73 
series, 185 
set, 40, 576, 579 
equality, 377 
for functions, 58 
for sets, 39 
of cardinality, 77 
equivalence 
of sequences, 116, 281 
relation, 378 
error-correcting codes, 392 
Euclidean algorithm, 35 
Euclidean metric, 391 
Euclidean space, 391 
Euler’s formula, 503, 506 
Euler’s number: see e 
exponential function, 490, 501 
exponentiation 
of cardinals, 81 
with base and exponent 
in N, 36 
with base in Q and expo- 
nent in Z, 101,102 
with base in R and expo- 
nent in Z, 140 
with base in Rt and ex- 
ponent in Q, 142 
with base in R* and ex- 
ponent in R, 177 
expression, 353 
extended real number system 
R*, 137, 153 
extremum: see maximum, min- 
imum 
exterior (point), 401, 435 


VI 


factorial, 189 
family, 67 
Fatou’s lemma, 614 
Fejér kernel, 524 
field, 95 
ordered, 97 
finite intersection property, 419 
finite set, 80 
fixed point theorem, 276, 559 
forward image: see image 
Fourier 
coefficients, 520 
inversion formula, 520 
series, 520 
series for arbitrary peri- 
ods, 531 
theorem, 536 
transform, 520 
fractional part, 512 
free variable, 368 
frequency, 518 
Fubini’s theorem, 624 
for finite series, 188 
for infinite series, 217 
see also: interchanging in- 
tegrals/sums with in- 
tegrals/sums 
function, 55 
implicit definition, 57 
fundamental theorems of cal- 
culus, 338, 341 


geometric series, 190, 196 
formula, 197, 200, 460 

geodesic, 394 

gradient, 550 

graph, 58, 75, 251, 568 


INDEX 


greatest lower bound: see least 
upper bound 


hairy ball theorem, 559 
half-infinite, 244 
half-open, 243 
half-space, 591 
harmonic series, 199 
Hausdorff space, 437, 438 
Hausdorff maximality princi- 
ple, 240 
Heine-Borel theorem, 414 
for the real line, 248 
Hermitian form, 515 
homogeneity, 516, 535 
hypersurface, 568 


identity map (or operator), 63, 
536 

if: see implication 

iff (if and only if), 30 

ill-defined, 351,353 

image 

of sets, 64 
inverse image, 65 

imaginary, 498 

implication (if), 357 

implicit differentiation, 568 

implicit function theorem, 568 

improper integral, 318 

inclusion map, 63 

inconsistent, 227, 228, 502 

index of summation: see dummy 
variable 

index set, 67 

indicator function: see char- 
acteristic function 


INDEX 


induced 
metric, 391, 407 
topology, 407, 435 


induction: see Principle of math- 


ematical induction 
infinite 
interval, 244 
set, 80 
infimum: see supremum 
injection: see one-to-one func- 
tion 
inner product, 514 
integer part, 103, 133, 512 
integers Z 
definition, 85 
identification with ratio- 
nals, 94 
interspersing with ratio- 
nals, 103 
integral test, 332 
integration 
by parts, 343-345, 484 
laws, 315, 321 
piecewise constant, 313, 
315 
Riemann: see Riemann in- 
tegral 
interchanging 


derivatives with derivatives, 


10, 556 


finite sums with finite sums, 


187, 188 

integrals with integrals, 7, 
614, 624 

limits with derivatives, 9, 
463 


VII 


limits with integrals, 9, 
462, 610, 619 
limits with length, 12 
limits with limits, 8, 9, 
450 
limits with sums, 617 
sums with derivatives, 463, 
476 
sums with integrals, 459, 
476, 613, 614, 616 
sums with sums, 6, 217 
interior (point), 401, 435 
intermediate value theorem, 
274, 432 
intersection 
pairwise, 46 
interval, 243 
intrinsic, 413 
inverse © 
function theorem, 301, 562 
image, 65 
in logic, 362 
of functions, 63 
invertible function: see bijec- 
tion 
local, 562 
involution, 498 
irrationality, 108 
of /2, 104, 137 
isolated point, 247 
isometry, 406 


jump discontinuity, 268 


1, 1, I, L!, L2, L®, 391- 
393, 516, 617 


Vill 


equivalence of in finite di- 
mensions, 396 
see also: absolutely inte- 
grable 
see also: supremum as norm 
L’H6pital’s rule, 11, 303 
label, 67 
laws of algebra 
for complex numbers, 496, 
497 
for integers, 89 
for rationals, 95 
for reals, 122 
laws of arithmetic: see laws 
of algebra 
laws of exponentiation, 101, 
102, 141, 143, 177, 490 
least upper bound, 134 
least upper bound prop- 
erty, 135, 158 
see also supremum 
Lebesgue dominated conver- 
gence theorem, 619 
Lebesgue integral 
of absolutely integrable func- 
tions, 618 
of nonnegative functions, 
608 
of simple functions, 604 
upper and lower, 620 - 
vs. the Riemann integral, 
622 
Lebesgue measurable, 590 
Lebesgue measure, 577 
motivation of, 575-577 
Lebesgue monotone convergence 


INDEX 


theorem, 610 
Leibnitz rule, 293, 554 
lemma, 28 
length of interval, 308 
limit 

at infinity, 286 

formal (LIM), 118, 150, 
412 

laws, 150, 256, 500 

left and right, 265 

limiting values of functions, 

5, 254, 441 

of sequences, 148 
pointwise, 444 

uniform, see uniform limit 
uniqueness of, 148, 256, 


397, 442 
limit inferior, see limit supe- 
rior 


limit point 
of sequences, 160, 409 
of sets, 247 
limit superior, 162 
linear combination, 535 
linearity 
approximate, 541 
of convolution, 471, 522 
of finite series, 186 
of limits, 151 
of infinite series, 194 
of inner product, 515 
of integration, 315, 321, 
606, 612 
of transformations, 535 
Lipschitz constant, 298 
Lipschitz continuous, 298 


INDEX 


logarithm (natural), 492 
power series of, 460, 492 

logical connective, 354 

lower bound: see upper bound 


majorize, 317, 608 
manifold, 572 
map: see function 
matrix, 536 
identification with linear 
transformations, 537- 
540 
maximum, 233, 296 
local, 296 
of functions, 252, 271 
principle, 271, 427 
mean value theorem, 297 
measurability 
for functions, 597, 598 
for sets, 590 
motivation of, 574 
see also: Lebesgue mea- 
sure, outer measure 
meta-proof, 140 
metric, 390 
ball: see ball 
on C, 499 
on R, 391 
space, 390 
see also: distance 
minimum, 233, 296 
local, 296 
of a set of natural num- 
bers, 210 
of functions, 252, 271 
minorize: see majorize 
monomial, 518 


IX 


monotone (increasing or de- 
creasing) 
convergence: see Lebesgue 
monotone convergence 
theorem 
function, 276, 336 
measure, 576, 580 
sequence, 159 
morphism: see function 
moving bump example, 446, 
614 
multiplication 
of cardinals, 81 
of complex numbers, 497 
of functions, 252 
of integers, 86 
of matrices, 536, 540 
of natural numbers, 33 
of rationals, 93, 94 
of reals, 120 


Natural numbers N 
are infinite, 80 
axioms: see Peano axioms 
identification with integers, 
87 
informal definition, 17 
in set theory: see Axiom 
of infinity 
uniqueness of, 76 
negation 
in logic, 355 
of extended reals, 154 
of complex numbers, 497 
of integers, 88 
of rationals, 93 
of reals, 121 


X 


negative: see negation, posi- 
tive 

neighbourhood, 434 

Newton’s approximation, 291, 
544 

non-constructive, 229 

non-degenerate, 516 

nowhere differentiable function, 
464, 508 


objects, 38 
primitive, 53 
one-to-one function, 61 
one-to-one correspondence: see 
bijection 
onto, 61 
open 
box, 578 
cover, 414 
interval, 243 
set, 403 
or: see disjunction 
order ideal, 238 
order topology, 437 
ordered pair, 70 
construction of, 74 
ordered n-tuple, 71 
ordering 
lexicographical, 239 
of cardinals, 227 
of orderings, 240 
of partitions, 310 
of sets, 233 
of the extended reals, 154 
of the integers, 91 
of the natural numbers, 
31 


INDEX 


of the rationals, 97 

of the reals, 129 
orthogonality, 516 
orthonormal, 519 
oscillatory discontinuity, 268 
outer measure, 579 

non-additivity of, 587, 589 


pair set, 41 
partial function, 69 
partially ordered set, 45, 232 
partial sum, 190 
Parseval identity, 531 
see also: Plancherel for- 
mula 
partition, 308 
path-connected, 432 
Peano axioms, 18-21, 23 
perfect matching: see bijec- 
tion 
periodic, 511 
extension, 512 
piecewise 
constant, 312 
constant Riemann-Stieltjes 
integral, 335 
continuous, 330 
pigeonhole principle, 83 
Plancherel formula (or theo- 
rem), 520, 528 
pointwise convergence, 444 
of series, 456 
topology of, 455 
polar representation, 507 
polynomial, 265, 464 
and convolution, 467 
approximation by, 465, 470 


INDEX 


positive 
complex number, 498, 502 
integer, 88 
inner product, 515 
measure, 576, 580 
natural number, 30 
rational, 96 
real, 128 
power series, 476 
formal, 474 
multiplication of, 487 
uniqueness of, 481 
power set, 66 
pre-image: see inverse image 
principle of infinite descent, 
106 
principle of mathematical in- 
duction, 21 
backwards induction, 33 
strong induction, 32, 234 
transfinite, 237 
product rule, see Leibnitz rule 
product topology, 455 
projection, 536 
proof 
by contradiction, 352, 363 
abstract examples, 364-367, 
375-377 
proper subset, 44 
property, 354 
proposition, 28 
propositional logic, 367 
Pythagoras’ theorem, 516 


quantifier, 369 
existential (for some), 371 
negation of, 372 


nested, 372 

universal (for all), 370 
Quotient: see division 
Quotient rule, 293, 555 


radius of convergence, 475 
range, 55 
ratio test, 206 
rational numbers Q 
definition, 93 
identification with reals, 
121 
interspersing with ratio- 
nals, 103 
interspersing with reals, 
132 
real analytic, 478 
real numbers R 
are uncountable: see un- 
countability of the re- 
als 
definition, 117 
real part, 498 
real-valued, 455 
rearrangement 
of absolutely convergent 
series, 202 
of divergent series, 203, 
222 
of finite series, 185 
of non-negative series, 200 
reciprocal 
of complex numbers, 499 
of rationals, 95 
of reals, 125 
recursive definitions, 26,76 


XII 


reductio ad absurdum: see proof 
by contradiction 
relative topology: see induced 
topology 
removable discontinuity: see 
removable singularity 
removable singularity, 259, 268 
restriction of functions, 250 
Riemann hypothesis, 200 
Riemann integrability, 318 
closure properties, 321-326 
failure of, 332 
of bounded continuous func- 
tions, 328 
of continuous functions on 
compacta, 328 
of monotone functions, 330 


of piecewise continuous bounded 


functions, 329 
of uniformly continuous func- 
tions, 326 
Riemann integral, 318 
upper and lower, 317 
Riemann sums (upper and lower), 
321 
Riemann zeta function, 199 
Riemann-Stieltjes integral, 336 
ring, 89 
commutative, 89, 497 
Rolle’s theorem, 297 
root, 140 
mean square: see L? 
test, 204 
row vector, 533 
Russell’s paradox, 52 


scalar multiplication, 533 


INDEX 


of functions, 252 
Schroder-Bernstein theorem, 
227 
sequence, 109 
finite, 74 
series 
finite, 179, 182 
formal infinite, 189 
laws, 194, 220 
of functions, 459 
on arbitrary sets, 220 
on countable sets, 216 
vs. sum, 180 
set 
axioms: see axioms of set 
theory 
informal definition, 38 
signum function, 258 
simple function, 602 
sine: see trigonometric func- 
tions 
singleton set, 41 
singularity, 268 
space, 390 
statement, 350 
sub-additive measure, 576, 580 
subset, 44 
subsequence, 172, 408 
substitution: see rearrange- 
ment 
subtraction 
formal (——), 86 
of functions, 252 
of integers, 91 
sum rule, 292 
summation by parts, 484 


INDEX 


sup norm: see supremum as 
norm 
support, 465 
supremum (and infimum) 
as metric, 393 
as norm, 393, 457 
of a set of extended reals, 
156, 157 
of a set of reals, 137, 139 
of sequences of reals, 158 
square root, 56 
Square wave, 512, 518 
Squeeze test 
for sequences, 167 
Stone-Weierstrass theorem, 472, 
522 
strict upper bound, 235 
surjection: see onto 


taxi-cab metric, 392 

tangent: see trigonometric func- 
tion 

Taylor series, 480 

Taylor’s formula: see Taylor 
series 

telescoping series, 195 

ten, 381 

theorem, 28 

topological space, 433 

totally bounded, 418 

totally ordered set, 45, 233 

transformation: see function 

translation invariance, 577, 580, 
591 

transpose, 534 

triangle inequality 

in Euclidean space, 399 


XII 


in inner product spaces, 
516 
in metric spaces, 390 
in C, 499 
in R, 99 
for finite series, 181, 186 
for integrals, 618 
trichotomy of order 
of extended reals, 155 
for natural numbers, 31 
for integers, 91 
for rationals, 97 
for reals, 129 
trigonometric functions, 503, 
509 
and Fourier series, 530 
trigonometric polynomi- 
als, 518 
power series, 504, 508 
trivial topology, 437 
two-to-one function, 61 


uncountability, 208 
of the reals, 225 
undecidable, 228 
uniform continuity, 280, 428 
uniform convergence, 447 
and anti-derivatives, 462 
and derivatives, 451 
and integrals, 459 
and limits, 450 
and radius of convergence, 
476 
as a metric, 453, 514 
of series, 457 
uniform limit, 447 
of bounded functions, 451 


XIV 


of continuous functions, 450 
and Riemann integration, 
458 
union, 67 
pairwise, 42 
universal set, 53 
upper bound, 
of a set of reals, 133 
of a partially ordered set, 
234 
see also: least upper bound 


variable, 368 

vector space, 534 

vertical line test, 55, 76, 567 
volume, 578 


Weierstrass approximation the- 
orem, 465, 470-471, 
521 
Weierstrass example: see nowhere 
differentiable function 
Weierstrass M-test, 457 
well-defined, 351 
well-ordered sets, 234 
well ordering principle 
for natural numbers, 210 
for arbitrary sets, 241 


Zermelo-Fraenkel(-Choice) ax- 
ioms, 69 
see also axioms of set the- 
ory 
zero test 
for sequences, 167 
for series, 191 


Zorn’s lemma, 23 


Texts and Readings in Mathematics 


1. 
2. 
3 
4 
5. 
6 
Tá 
8 


co e 


R. B. Bapat: Linear Algebra and Linear Models (Second Edition) 
Rajendra Bhatia: Fourier Series (Second Edition) 


. C. Musili: Representations of Finite Groups 
. H. Helson: Linear Algebra (Second Edition) 


D. Sarason: Notes on Complex Function Theory 


. M. G. Nadkarni: Basic Ergodic Theory (Second Edition) 
. H. Helson: Harmonic Analysis (Second Edition) 


K. Chandrasekharan: A Course on Integration Theory 
K. Chandrasekharan: A Course on Topological Groups 


. R. Bhatia (ed.): Analysis, Geometry and Probability 

. K. R. Davidson: C* — Algebras by Example 

. M. Bhattacharjee et al.: Notes on Infinite Permutation Groups 

. V. S. Sunder: Functional Analysis — Spectral Theory 

. V. S. Varadarajan: Algebra in Ancient and Modern Times 

. M. G. Nadkarni: Spectral Theory of Dynamical Systems 

. A. Borel: Semisimple Groups and Riemannian Symmetric Spaces 
. M. Marcolli: Seiberg — Witten Gauge Theory 

. A. Bottcher and S. M. Grudsky: Toeplitz Matrices, Asymptotic 


Linear Algebra and Functional Analysis 


. A. R. Rao and P. Bhimasankaram: Linear Algebra (Second Edition) 
. C. Musili: Algebraic Geometry for Beginners 
. A. R. Rajwade: Convex Polyhedra with Regularity Conditions 


and Hilbert's Third Problem 


. S. Kumaresan: A Course in Differential Geometry and Lie Groups 
. Stef Tijs: Introduction to Game Theory 

. B. Sury: The Congruence Subgroup Problem 

. R. Bhatia (ed.): Connected at Infinity 

. K. Mukherjea: Differential Calculus in Normed Linear Spaces 

. Satya Deo: Algebraic Topology: A Primer 

. S. Kesavan: Nonlinear Functional Analysis: A First Course 

. S. Szabó: Topics in Factorization of Abelian Groups 

. S. Kumaresan and G. Santhanam: An Expedition to Geometry 

. D. Mumford: Lectures on Curves on an Algebraic Surface (Reprint) 
. J. W. Milnor and J. D. Stasheff: Characteristic Classes (Reprint) 

. K. R. Parthasarathy: Introduction to Probability and Measure 


(Corrected Reprint) 


. A. Mukherjee: Topics in Differential Topology 
. K. R. Parthasarathy: Mathematical Foundations of Quantum 


Mechanics 


. K. B. Athreya and S. N. Lahiri: Measure Theory . 
. Terence Tao: Analysis | 


